- Blog
- 06.24.2024
- Data Fundamentals, Product
Sentiment Analysis in Databricks with Anthropic Claude 3 Sonnet using Amazon Bedrock

Sentiment analysis is a powerful technique that enables organizations to gain valuable insights from unstructured data sources, such as customer reviews, social media posts, and survey responses. This article will explore various methods for performing sentiment analysis using Python on the Databricks platform, leveraging the capabilities of Anthropic Claude 3 Sonnet through Amazon Bedrock.
We will begin by introducing the concept of sentiment analysis, its applications, and the challenges associated with accurately analyzing textual data. Subsequently, we will delve into the Anthropic Claude 3 Sonnet, a state-of-the-art natural language processing model designed for sentiment analysis tasks. The article will guide you through the process of integrating Claude 3 Sonnet with Databricks, enabling you to harness its powerful capabilities for your sentiment analysis needs.
What is Sentiment Analysis?
Sentiment Analysis is a technique used to extract a numeric sentiment score from unstructured text, quantifying subjective information such as opinions, emotions, and attitudes. By analyzing text, Sentiment Analysis enables the transformation of qualitative data into actionable metrics, crucial for understanding customer feedback, market trends, and social media activities.
Large Language Models (LLMs) such as Anthropic's Claude 3 Sonnet are instrumental in performing Sentiment Analysis. These models leverage deep learning architectures to grasp complex linguistic nuances, undertaking tasks from tokenization to contextual embedding. By training on vast repositories of text data, LLMs can discern subtleties in sentiment, assigning scores that sum up the overall emotional tone of the input text.
For effective Sentiment Analysis, data preparation is paramount. Data engineers must ensure the reliability and quality of data by cleaning, normalizing, and pre-processing the text. They play a vital role in interfacing between the database infrastructure and the LLM, enabling seamless data flow. This involves optimizing queries, managing data pipelines, and ensuring that the textual data fed into the model is both relevant and noise-free, thus guaranteeing more accurate sentiment predictions.
Business examples of Sentiment Analysis
- Social Media Monitoring: Analyzing customer feedback on social media platforms to gauge brand perception and identify potential issues.
- Product Reviews Analysis: Evaluating customer reviews to understand sentiment towards products, identify areas for improvement, and inform product development decisions.
- Customer Support Optimization: Analyzing customer support interactions to identify dissatisfied customers, prioritize responses, and improve overall customer experience.
What is Anthropic Claude 3 Sonnet?
Anthropic Claude 3 Sonnet is a large language model developed by Anthropic, a company focused on building safe and ethical artificial intelligence systems. It is a version of the company's Claude model, trained using constitutional AI principles to align it with human values and preferences.
Pros:
- Impressive language understanding and generation capabilities
- Emphasis on safety and ethics
- Ability to engage in open-ended conversations
Cons:
- Like other large language models, it can generate biased or inconsistent outputs
- Training data and model details are not fully transparent
Ideal use cases include:
- Interactive assistants for writing, and research
- Analysis tasks that benefit from natural language processing capabilities while prioritizing safety and alignment with human values.
How to perform Sentiment Analysis in Databricks with Anthropic Claude 3 Sonnet using Python with the Amazon Bedrock SDK
Prerequisites for the boto3 Amazon Bedrock Python SDK
Start by installing the prerequisite libraries (note the Databricks SDK for Python is in beta, version 0.28.0, at the time of writing):
python3 -m pip install databricks-sdk boto3
Then load your source data into Databricks.
Python boto3 for Anthropic Claude 3 Sonnet and Databricks SDK
The example below involves product reviews, and assumes that the source data has been loaded into a table named "stg_sample_reviews" with four columns: id (the primary key), stars, product and review.
The Python script is shown below. Note it is good practice to handle credentials more securely than shown in this simple example. You might choose to use a secret management service instead of environment variables or hardcoding.
import os
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.sql import StatementParameterListItem
import logging
import json
import boto3
import botocore
from botocore.exceptions import ClientError
logger = logging.getLogger("demo")
# Use the Amazon Bedrock Converse API
def analyze_sentiment(text):
abc = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
prompt = f"""Provide a numeric rating that reflects the overall sentiment of the review.
The rating should be a single number between 1 and 5, where 1 represents the most negative sentiment and 5 represents the most positive sentiment.
Respond with only your numeric rating. Do not include any justification of the rating.
Review: {text}
"""
response = abc.converse(modelId = model_id,
messages = [{"role": "user", "content": [{"text": prompt}]}],
system = [{"text" : "Your job is to analyze online product reviews."}],
inferenceConfig = {"temperature": 0.5},
additionalModelRequestFields = {"top_k": 200}
)
return response['output']['message']['content'][0]['text'].strip()
sqlWhId = os.environ["DBKS_SQL_WHID"]
vCatalog = os.environ["DBKS_CATALOG"]
vSchema = os.environ["DBKS_SCHEMA"]
# Connect to workspace
wsclient = WorkspaceClient(
host = os.environ["DBKS_HOSTURL"],
token = os.environ["DBKS_TOKEN"]
)
ddl = wsclient.statement_execution.execute_statement(warehouse_id=sqlWhId,
catalog=vCatalog,
schema=vSchema,
statement=f"CREATE OR REPLACE TABLE `stg_sample_reviews_genai` (`id` INT NOT NULL, `ai_score` VARCHAR(1024) NOT NULL)")
# Fetch rows from the table
xsr = wsclient.statement_execution.execute_statement(warehouse_id=sqlWhId,
catalog=vCatalog,
schema=vSchema,
statement="SELECT `id`, `review` FROM `stg_sample_reviews`")
for r in xsr.result.data_array:
ai_score = analyze_sentiment(r[1])
print(f"ID {r[0]}: Score: {ai_score}")
dml = wsclient.statement_execution.execute_statement(warehouse_id=sqlWhId,
catalog=vCatalog,
schema=vSchema,
statement=f"INSERT INTO `stg_sample_reviews_genai` (`id`, `ai_score`) VALUES (:id, :ai_score)",
parameters=[ StatementParameterListItem.from_dict({"name": "id", "type":"INT", "value": r[0]}),
StatementParameterListItem.from_dict({"name": "ai_score", "type":"INT", "value": ai_score}) ])
After running the above script, you should find a new table has been created, which contains the AI-generated review score for every input record. Join this table to the original on the common id column to compare the AI-generated sentiment scores against the original star review.
The LLM was asked to score between 1 and 5, so you may choose to classify the scores more broadly as follows:
- 4 or 5 - Positive
- 3 - Neutral
- 1 or 2 - Negative
Sentiment Analysis in Databricks using Matillion to run Anthropic Claude 3 Sonnet via Amazon Bedrock
In the Matillion Data Productivity Cloud, orchestration pipelines like the one shown in the screenshot below can:
- Directly extract and load data, or call other pipelines to do so (as shown)
- Invoke Anthropic Claude 3 Sonnet, with a nominated prompt, against all rows from a nominated table
Sentiment Analysis in Databricks using Matillion
Data pipelines such as this manage all the connectivity and plumbing between the Databricks source and target tables, and the LLM.
This allows you to focus on the overall design and architecture, and the data analysis. To compare the AI-generated sentiment scores against the original star review, use a transformation pipeline like the one in the next screenshot.
Checking the results of Sentiment Analysis in Databricks using Matillion
The data sample shows two of the records. In one case the LLM's decision matches the original sentiment identically, but in the other record the ratings differ slightly. This is an example of the subjective nature of sentiment analysis.
Summary
Matillion is a data pipeline platform that enables data teams to build and manage pipelines efficiently for AI and analytics applications at scale. It offers a code-optional UI with pre-built components, as well as support for coding in SQL, Python, and DBT.
Matillion integrates with cloud platforms, customer data platforms, large language models, and more. It provides Git integration, AI-generated documentation, numerous no-code connectors, and the ability to build custom REST API connectors. Matillion supports variables for parameterization, hybrid SaaS deployment, data lineage tracking, pushdown ELT, and vector store connectivity. It also includes components for generative AI prompting, retrieval-augmented generation, and reverse ETL of AI insights.
For more examples of Matillion's AI components in action, check out our library of AI Videos and Demos.
To try Matillion yourself, using your own data, sign up for a free trial.
If you are already a Matillion user or trial customer, you can download the sentiment analysis example shown in the screenshots earlier, and run it on your own platform.
Featured Resources
Big Data London 2025: Key Takeaways and Maia Highlights
There’s no doubt about it – Maia dominated at Big Data London. Over the two-day event, word spread quickly about Maia’s ...
BlogSay Hello to Ask Matillion, Your New AI Assistant for Product Answers
We’re excited to introduce a powerful new addition to the Matillion experience: Ask Matillion.
BlogRethinking Data Pipeline Pricing
Discover how value-based data pipeline pricing improves ROI, controls costs, and scales data processing without billing surprises.
Share: