- Blog
- 06.24.2024
- Product, Data Fundamentals
Sentiment Analysis in Databricks with Meta Llama3 70B using Amazon Bedrock

Sentiment Analysis is a powerful technique that enables organizations to gain valuable insights from textual data, such as product reviews, social media posts, and customer feedback.
This article will guide you through various approaches to performing Sentiment Analysis on large datasets using Python in Databricks, leveraging the cutting-edge Meta Llama3 70B model through Amazon Bedrock.
We will begin by exploring the fundamentals of Sentiment Analysis, as a foundation for understanding the subsequent steps. Next, we will introduce Meta Llama3 70B, a state-of-the-art language model developed by Meta AI Research, and discuss its capabilities in handling complex NLP tasks with high accuracy.
What is Sentiment Analysis?
Sentiment Analysis is a sophisticated technique used to extract a numeric sentiment score from unstructured text, transforming subjective data into quantifiable insights. By evaluating the polarity of words, phrases, and sentences, it assigns scores that typically range from negative to positive. Large Language Models (LLMs) like Meta's Llama3 70B have revolutionized sentiment analysis by leveraging their deep understanding of context and semantics to enhance accuracy. These models are pre-trained on vast corpuses of text and fine-tuned to recognize nuanced expressions of sentiment.
For reliable sentiment analysis, the preparation of data is critical. Data engineers play a pivotal role in interfacing between the structured databases and LLMs. They must ensure data is clean, consistent, and enriched with relevant metadata. This entails preprocessing text data to remove noise, handle missing values, and standardize formats. Effective data pipelines need to be crafted to funnel this well-prepared data to the LLM, ensuring it operates on robust and representative datasets.
Business examples of Sentiment Analysis:
- Social Media Monitoring: Analyze user-generated content on social media platforms to gauge public sentiment towards products, brands, or campaigns.
- Customer Feedback Analysis: Automatically categorize and prioritize customer reviews, support tickets, or survey responses based on sentiment scores.
- Market Research: Gain insights into consumer opinions and preferences by analyzing product reviews, forum discussions, or news articles related to a specific industry or market segment.
What is Meta Llama3 70B?
Meta Llama3 70B is a large language model developed by Meta AI, a variant of the LLaMA model family. It's a transformer-based architecture with approximately 70 billion parameters, trained on a massive dataset of text from the internet. The model is designed to generate human-like text, answer questions, and engage in conversations. Technically, it uses a decoder-only architecture with a combination of masked language modeling and next-sentence prediction objectives.
Pros:
- Highly accurate and informative responses
- Ability to understand context and follow conversations
- Can generate creative and coherent text
Cons:
- Requires significant computational resources and memory
- It may produce biased or toxic responses if not properly fine-tuned
- Limited domain-specific knowledge
Ideal use cases:
- Conversational AI and chatbots
- Language translation and localization
- Text summarization and generation
- Dialogue systems and virtual assistants
- Content creation and writing assistance
How to perform Sentiment Analysis in Databricks with Meta Llama3 70B using Python with the Amazon Bedrock SDK
Prerequisites for the boto3 Amazon Bedrock Python SDK
Start by installing the prerequisite libraries (note the Databricks SDK for Python is in beta, version 0.28.0, at the time of writing):
python3 -m pip install databricks-sdk boto3
Then load your source data into Databricks.
Python boto3 for Meta Llama3 70B and Databricks SDK
The example below involves product reviews, and assumes that the source data has been loaded into a table named "stg_sample_reviews" with four columns: id (the primary key), stars, product and review.
The Python script is shown below. Note it is good practice to handle credentials more securely than shown in this simple example. You might choose to use a secret management service instead of environment variables or hardcoding.
import os
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.sql import StatementParameterListItem
import logging
import json
import boto3
import botocore
from botocore.exceptions import ClientError
logger = logging.getLogger("demo")
# Use the Amazon Bedrock InvokeModel API
def analyze_sentiment(text):
abc = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
model_id = "meta.llama3-70b-instruct-v1:0"
prompt = f"""<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Analyze the sentiment of the following text and return a score from 1 to 5, where 1 represents the most negative sentiment and 5 represents the most positive sentiment: {text}
Respond with a single number only. Do not include any notes, justification, explanation or confidence level, just the number.
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
body = json.dumps({"prompt": prompt, "top_p": 0.9, "temperature": 0.5})
response = abc.invoke_model(body=body, modelId=model_id, accept='application/json', contentType='application/json')
response_body = json.loads(response.get('body').read())
return response_body.get('generation').strip()
sqlWhId = os.environ["DBKS_SQL_WHID"]
vCatalog = os.environ["DBKS_CATALOG"]
vSchema = os.environ["DBKS_SCHEMA"]
# Connect to workspace
wsclient = WorkspaceClient(
host = os.environ["DBKS_HOSTURL"],
token = os.environ["DBKS_TOKEN"]
)
ddl = wsclient.statement_execution.execute_statement(warehouse_id=sqlWhId,
catalog=vCatalog,
schema=vSchema,
statement=f"CREATE OR REPLACE TABLE `stg_sample_reviews_genai` (`id` INT NOT NULL, `ai_score` VARCHAR(1024) NOT NULL)")
# Fetch rows from the table
xsr = wsclient.statement_execution.execute_statement(warehouse_id=sqlWhId,
catalog=vCatalog,
schema=vSchema,
statement="SELECT `id`, `review` FROM `stg_sample_reviews`")
for r in xsr.result.data_array:
ai_score = analyze_sentiment(r[1])
dml = wsclient.statement_execution.execute_statement(warehouse_id=sqlWhId,
catalog=vCatalog,
schema=vSchema,
statement=f"INSERT INTO `stg_sample_reviews_genai` (`id`, `ai_score`) VALUES (:id, :ai_score)",
parameters=[ StatementParameterListItem.from_dict({"name": "id", "type":"INT", "value": r[0]}),
StatementParameterListItem.from_dict({"name": "ai_score", "type":"INT", "value": ai_score}) ])
After running the above script, you should find a new table has been created, which contains the AI-generated review score for every input record. Join this table to the original on the common id column to compare the AI-generated sentiment scores against the original star review.
The LLM was asked to score between 1 and 5, so you may choose to classify the scores more broadly as follows:
- 4 or 5 - Positive
- 3 - Neutral
- 1 or 2 - Negative
Sentiment Analysis in Databricks using Matillion to run Meta Llama3 70B via Amazon Bedrock
In the Matillion Data Productivity Cloud, orchestration pipelines like the one shown in the screenshot below can:
- Directly extract and load data, or call other pipelines to do so (as shown)
- Invoke Meta Llama3 70B, with a nominated prompt, against all rows from a nominated table
Sentiment Analysis in Databricks using Matillion
Data pipelines such as this manage all the connectivity and plumbing between the Databricks source and target tables, and the LLM.
This allows you to focus on the overall design and architecture, and the data analysis. To compare the AI-generated sentiment scores against the original star review, use a transformation pipeline like the one in the next screenshot.
Checking the results of Sentiment Analysis in Databricks using Matillion
The data sample shows two of the records. In one case the LLM's decision matches the original sentiment identically, but in the other record the ratings differ slightly. This is an example of the subjective nature of sentiment analysis.
Summary
Matillion is a data pipeline platform that empowers teams to build and manage data pipelines rapidly for AI and analytics at scale. It offers a code-optional UI with pre-built components or coding in SQL, Python, and dbt. Matillion integrates with cloud platforms, customer data platforms, large language models, and more.
It democratizes AI access with no-code connectors, REST API connectivity, parameterization, and hybrid SaaS deployment. Matillion provides data lineage, pushdown ELT, AI components for generative AI prompting and vector store connectivity. It enables reverse ETL of AI-generated insights and natural language pipeline building with Copilot. Matillion unifies augmented data engineering for productivity and collaboration.
For more examples of Matillion's AI components in action, check out our library of AI Videos and Demos.
To try Matillion yourself, using your own data, sign up for a free trial.
If you are already a Matillion user or trial customer, you can download the sentiment analysis example shown in the screenshots earlier, and run it on your own platform.
Featured Resources
What Is Massively Parallel Processing (MPP)? How It Powers Modern Cloud Data Platforms
Massively Parallel Processing (often referred to as simply MPP) is the architectural backbone that powers modern cloud data ...
BlogETL and SQL: How They Work Together in Modern Data Integration
Explore how SQL and ETL power modern data workflows, when to use SQL scripts vs ETL tools, and how Matillion blends automation ...
WhitepapersUnlocking Data Productivity: A DataOps Guide for High-performance Data Teams
Download the DataOps White Paper today and start building data pipelines that are scalable, reliable, and built for success.
Share: