- Blog
- 06.24.2024
- Data Fundamentals
Sentiment Analysis in Amazon Redshift with Anthropic Claude 3 Sonnet using Amazon Bedrock

Sentiment Analysis is a powerful technique that enables organizations to gain valuable insights from unstructured data, such as product reviews, social media posts, and customer feedback. This article will explore various methods for performing Sentiment Analysis in Redshift, Amazon's cloud-based data warehousing solution, using Anthropic Claude 3 Sonnet through Amazon Bedrock.
We will begin by introducing Sentiment Analysis and its applications, followed by a step-by-step guide on leveraging Python for this task. This will demonstrate how Claude 3 Sonnet from Amazon Bedrock can be used to streamline the Sentiment Analysis process within Redshift.
What is Sentiment Analysis?
Sentiment Analysis extracts a numeric sentiment score from unstructured text, enabling quantitative insights into subjective data. Leveraging natural language processing (NLP) methodologies, sentiment analysis evaluates text data for underlying emotional tones, thus generating a sentiment score that can range from negative to positive.
Large Language Models (LLMs), like Claude 3 Sonnet, have revolutionized Sentiment Analysis by utilizing deep learning techniques to understand context and nuances in human language. These models are trained on vast corpora and can discern sentiments with a high degree of accuracy by capturing complex patterns and subtleties within the text.
Reliable data preparation is critical in Sentiment Analysis. Data engineers play a pivotal role in ensuring high-quality, clean, and well-annotated text data is fed into the LLMs. They are responsible for interfacing between the raw data stored in databases and the LLM, employing ETL (Extract, Transform, Load) processes to refine and structure the input text. This process enhances the model's performance, ensuring accurate sentiment scoring and meaningful analysis.
Business Examples of Sentiment Analysis
- E-commerce: Analyze customer reviews to gauge product satisfaction and identify areas for improvement.
- Social Media Monitoring: Track public sentiment towards brands, campaigns, or events by analyzing social media posts.
- Customer Support: Automatically categorize support tickets based on sentiment to prioritize and route negative feedback for immediate attention.
What is Anthropic Claude 3 Sonnet?
Anthropic Claude 3 Sonnet is a large language model trained by Anthropic using their constitutional AI principles. It is designed to be capable of engaging in open-ended dialogue, answering questions, and assisting with a wide range of tasks. Sonnet builds upon the earlier versions of Claude, with improvements in factual accuracy, reasoning abilities, and safety considerations.
Pros:
- Broad knowledge base and strong language understanding
- Commitment to beneficial and ethical AI development
- Ability to engage in nuanced and contextual communication
Cons:
- Like any language model, it may occasionally produce biased or inaccurate outputs
- Its capabilities and limitations are still being explored and understood
Ideal use cases:
- Research and analysis across various domains
- Creative writing and ideation
- Task assistance and open-ended dialogue in an educational or professional setting
How to perform Sentiment Analysis in Redshift with Anthropic Claude 3 Sonnet using Python with the Amazon Bedrock SDK
Prerequisites for the boto3 Amazon Bedrock Python SDK
Start by installing the prerequisite libraries:
python3 -m pip install psycopg2-binary boto3
Afterwards load your source data into Redshift.
Python boto3 for Anthropic Claude 3 Sonnet
The example below involves product reviews, and assumes that the source data has been loaded into a table named "stg_sample_reviews" with four columns: id (the primary key), stars, product and review.
The Python script is shown below. Note it is good practice to handle credentials more securely than shown in this simple example. You might choose to use a secret management service instead of environment variables or hardcoding.
Also please note that handling large amounts of data using fetchall() can be inefficient, and may result in memory issues. For large datasets, you should use a cursor to fetch rows incrementally using the fetchmany() method instead.
If you are working in a schema other than "public" you will need the -c connection option to specify the object search path. Set your RS_OPTIONS environment variable to "-c search_path=yourSchemaName,public" replacing the schema name with your own. The newly created table will be added to this named schema.
import os
import psycopg2
import logging
import json
import boto3
import botocore
from botocore.exceptions import ClientError
logger = logging.getLogger("demo")
# Use the Amazon Bedrock Converse API
def analyze_sentiment(text):
abc = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
prompt = f"""Provide a numeric rating that reflects the overall sentiment of the review.
The rating should be a single number between 1 and 5, where 1 represents the most negative sentiment and 5 represents the most positive sentiment.
Respond with only your numeric rating. Do not include any justification of the rating.
Review: {text}
"""
response = abc.converse(modelId = model_id,
messages = [{"role": "user", "content": [{"text": prompt}]}],
system = [{"text" : "Your job is to analyze online product reviews."}],
inferenceConfig = {"temperature": 0.5},
additionalModelRequestFields = {"top_k": 200}
)
return response['output']['message']['content'][0]['text'].strip()
# Database connection parameters
db_params = {
'dbname': os.environ["RS_DBNAME"],
'user': os.environ["RS_USER"],
'password': os.environ["RS_PASSWORD"],
'host': os.environ["RS_HOST"],
'port': os.environ["RS_PORT"],
'options': os.environ["RS_OPTIONS"]
}
# Create a connection to the Redshift database
try:
conn = psycopg2.connect(**db_params)
except Exception as e:
print(f"Unable to connect to the database: {e}")
exit(1)
conn.autocommit = True
# Create a cursor object
cur = conn.cursor()
# Fetch all the rows from the source table
query = f'SELECT "id", "review" FROM "stg_sample_reviews"'
try:
# Create the table to hold the results, if it does not already exist
cur.execute('''CREATE TABLE IF NOT EXISTS "stg_sample_reviews_genai"
("id" INT NOT NULL, "ai_score" VARCHAR(1024) NOT NULL)''')
cur.execute(f'DELETE FROM "stg_sample_reviews_genai"')
# Execute the query
cur.execute(query)
# Fetch and process each row
for row in cur.fetchall():
ai_score = analyze_sentiment(row)
cur.execute(f'INSERT INTO "stg_sample_reviews_genai" ("id", "ai_score") VALUES ({row[0]}, {ai_score})')
except Exception as e:
print(f"SQL error: {e}")
finally:
cur.close()
conn.close()
After running the above script, you should find a new table has been created, which contains the AI-generated review score for every input record. Join this table to the original on the common id column to compare the AI-generated sentiment scores against the original star review.
The LLM was asked to score between 1 and 5, so you may choose to classify the scores more broadly as follows:
- 4 or 5 - Positive
- 3 - Neutral
- 1 or 2 - Negative
Sentiment Analysis in Redshift using Matillion to run Anthropic Claude 3 Sonnet via Amazon Bedrock
In the Matillion Data Productivity Cloud, orchestration pipelines like the one shown in the screenshot below can:
- Directly extract and load data, or call other pipelines to do so (as shown)
- Invoke Anthropic Claude 3 Sonnet, with a nominated prompt, against all rows from a nominated table
Sentiment Analysis in Redshift using Matillion
Data pipelines such as this manage all the connectivity and plumbing between the Redshift source and target tables, and the LLM.
This allows you to focus on the overall design and architecture, and the data analysis. To compare the AI-generated sentiment scores against the original star review, use a transformation pipeline like the one in the next screenshot.
Checking the results of Sentiment Analysis in Redshift using Matillion
The data sample shows two of the records. In one case the LLM's decision matches the original sentiment identically, but in the other record the ratings differ slightly. This is an example of the subjective nature of sentiment analysis.
Summary
Matillion is a data pipeline platform that empowers teams to rapidly build and manage data pipelines for AI and analytics at scale. It offers a code-optional UI with pre-built components, or users can code in SQL, Python, or dbt. Matillion integrates with cloud data platforms, customer data platforms, large language models, and more. It democratizes access to AI with no-code components for generative AI prompting, retrieval-augmented generation, and vector store connectivity.
Key features include Git integration, AI-generated documentation, custom REST API connectors, parameterization with variables, data lineage tracking, pushdown ELT, and a copilot for natural language pipeline building. Matillion enables augmented data engineering with AI capabilities seamlessly integrated into data pipelines.
For more examples of Matillion's AI components in action, check out our library of AI Videos and Demos.
To try Matillion yourself, using your own data, sign up for a free trial.
If you are already a Matillion user or trial customer, you can download the sentiment analysis example shown in the screenshots earlier and run it on your own platform.
Featured Resources
What Is Massively Parallel Processing (MPP)? How It Powers Modern Cloud Data Platforms
Massively Parallel Processing (often referred to as simply MPP) is the architectural backbone that powers modern cloud data ...
BlogETL and SQL: How They Work Together in Modern Data Integration
Explore how SQL and ETL power modern data workflows, when to use SQL scripts vs ETL tools, and how Matillion blends automation ...
WhitepapersUnlocking Data Productivity: A DataOps Guide for High-performance Data Teams
Download the DataOps White Paper today and start building data pipelines that are scalable, reliable, and built for success.
Share: