The Agentic Advantage series: Real-world AI success stories from leading CDAOs

Register Now

Sentiment Analysis in Amazon Redshift with Titan Text Express using Amazon Bedrock

Sentiment Analysis, the process of determining the emotional tone behind a piece of text, has become an increasingly valuable tool for businesses seeking to gain insights from customer feedback and online reviews.

This article will guide you through various techniques for performing Sentiment Analysis on data stored in Amazon Redshift, starting with a Python-based approach. We will introduce Titan Text Express, a powerful text analysis engine, and explore how it can be integrated with Amazon Redshift through Amazon Bedrock.

To begin, we will provide an overview of Sentiment Analysis, explaining its significance and the underlying principles that enable machines to comprehend and categorize human emotions expressed in written form.

What is Sentiment Analysis?

Sentiment Analysis involves extracting a numeric sentiment score from unstructured text to determine emotional tone, be it positive, negative, or neutral. By leveraging large language models (LLMs) such as Titan Text Express, this process becomes more precise. These models are adept at understanding and interpreting human language nuances, enhancing sentiment prediction accuracy.

Proper data preparation is critical for reliable Sentiment Analysis. Data engineers play a crucial role. They must facilitate efficient data pipelines that connect databases to LLMs, ensuring the data flow is seamless and the input quality is maintained. This involves writing complex ETL (Extract, Transform, Load) scripts and optimizing storage systems for quick access, thereby bridging the gap between raw data repositories and sophisticated machine learning models.

Business examples of Sentiment Analysis

  • Social Media Monitoring: Analyzing customer feedback on social media platforms to gauge sentiment towards products or services.
  • Customer Support: Automatically classifying support tickets or emails based on sentiment to prioritize and route negative feedback for immediate attention.
  • Market Research: Analyzing product reviews or survey responses to understand consumer sentiment towards brands or competitors.

What is Titan Text Express?

Amazon's Titan Text Express is a generative AI tool designed for natural language processing tasks. Leveraging advanced machine learning algorithms, it can generate human-like text based on input prompts, making it suitable for a variety of applications including content creation, automated responses, and data analysis. The engine is built upon sophisticated deep learning models trained on extensive datasets, enabling it to deliver high accuracy and contextual relevance.

Pros:

  • High-quality, natural language generation.
  • Versatile use across different domains.
  • Scalable to handle large volumes of tasks.

Cons:

  • Potential ethical concerns regarding misuse.
  • Requires significant computational resources.
  • May struggle with nuanced or highly specialized topics.

Ideal Use Cases:

  • Automating customer service interactions.
  • Drafting and editing content for websites and social media.
  • Generating reports and summaries from large datasets.
  • Enhancing virtual assistants with more natural conversation capabilities.

How to perform Sentiment Analysis in Redshift with Titan Text Express using Python with the Amazon Bedrock SDK

Prerequisites for the boto3 Amazon Bedrock Python SDK

Start by installing the prerequisite libraries:

python3 -m pip install psycopg2-binary boto3

Afterwards load your source data into Redshift.

Python boto3 for Titan Text Express

The example below involves product reviews, and assumes that the source data has been loaded into a table named "stg_sample_reviews" with four columns: id (the primary key), stars, product and review.

The Python script is shown below. Note it is good practice to handle credentials more securely than shown in this simple example. You might choose to use a secret management service instead of environment variables or hardcoding.

Also please note that handling large amounts of data using fetchall() can be inefficient, and may result in memory issues. For large datasets, you should use a cursor to fetch rows incrementally using the fetchmany() method instead.

If you are working in a schema other than "public" you will need the -c connection option to specify the object search path. Set your RS_OPTIONS environment variable to "-c search_path=yourSchemaName,public" replacing the schema name with your own. The newly created table will be added to this named schema.

import os
import psycopg2
import logging
import json
import boto3
import botocore
from botocore.exceptions import ClientError

logger = logging.getLogger("demo")

# Use the Amazon Bedrock InvokeModel API
def analyze_sentiment(text):
    abc = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
    model_id = "amazon.titan-text-express-v1"
    prompt = f"""Your job is to analyze online product reviews.
Provide a numeric rating that reflects the overall sentiment of the review.
The rating should be a single number between 1 and 5, where 1 represents the most negative sentiment and 5 represents the most positive sentiment.
Respond with only your numeric rating. Do not include any justification of the rating. Do not use the word 'rating'.
Review: {text}
"""
    body = json.dumps({"inputText": prompt, "textGenerationConfig" : {"topP":0.9, "temperature":0.5}})
    response = abc.invoke_model(body=body, modelId=model_id, accept='application/json', contentType='application/json')
    response_body = json.loads(response.get('body').read())
    return response_body.get('results')[0].get('outputText')

# Database connection parameters
db_params = {
    'dbname': os.environ["RS_DBNAME"],
    'user': os.environ["RS_USER"],
    'password': os.environ["RS_PASSWORD"],
    'host': os.environ["RS_HOST"],
    'port': os.environ["RS_PORT"],
    'options': os.environ["RS_OPTIONS"]
}

# Create a connection to the Redshift database
try:
    conn = psycopg2.connect(**db_params)
except Exception as e:
    print(f"Unable to connect to the database: {e}")
    exit(1)

conn.autocommit = True

# Create a cursor object
cur = conn.cursor()

# Fetch all the rows from the source table
query = f'SELECT "id", "review" FROM "stg_sample_reviews"'

try:
    # Create the table to hold the results, if it does not already exist
    cur.execute('''CREATE TABLE IF NOT EXISTS "stg_sample_reviews_genai"
 ("id" INT NOT NULL, "ai_score" VARCHAR(1024) NOT NULL)''')

    cur.execute(f'DELETE FROM "stg_sample_reviews_genai"')

    # Execute the query
    cur.execute(query)

    # Fetch and process each row
    for row in cur.fetchall():
        ai_score = analyze_sentiment(row)
        cur.execute(f'INSERT INTO "stg_sample_reviews_genai" ("id", "ai_score") VALUES ({row[0]}, {ai_score})')

except Exception as e:
    print(f"SQL error: {e}")
finally:
    cur.close()
    conn.close()

After running the above script, you should find a new table has been created, which contains the AI-generated review score for every input record. Join this table to the original on the common id column to compare the AI-generated sentiment scores against the original star review.

The LLM was asked to score between 1 and 5, so you may choose to classify the scores more broadly as follows:

  • 4 or 5 - Positive
  • 3 - Neutral
  • 1 or 2 - Negative

Sentiment Analysis in Redshift using Matillion to run Titan Text Express via Amazon Bedrock

In the Matillion Data Productivity Cloud, orchestration pipelines like the one shown in the screenshot below can:

  • Directly extract and load data, or call other pipelines to do so (as shown)
  • Invoke Titan Text Express, with a nominated prompt, against all rows from a nominated table

Sentiment Analysis in Redshift using Matillion

Data pipelines such as this manage all the connectivity and plumbing between the Redshift source and target tables, and the LLM.

This allows you to focus on the overall design and architecture, and the data analysis. To compare the AI-generated sentiment scores against the original star review, use a transformation pipeline like the one in the next screenshot.

Checking the results of Sentiment Analysis in Redshift using Matillion

The data sample shows two of the records. In one case the LLM's decision matches the original sentiment identically, but in the other record the ratings differ slightly. This is an example of the subjective nature of sentiment analysis.

Summary

Matillion is a data pipeline platform that empowers teams to build and manage data pipelines rapidly for AI and analytics at scale. It offers a code-optional UI with pre-built components or coding in SQL, Python, and dbt. Matillion integrates with cloud data platforms, customer data platforms, large language models, and more. It democratizes AI access with no-code connectors, REST API connectivity, parameterization, and hybrid SaaS deployment.

Matillion provides data lineage, pushdown ELT, AI components for prompting and vector stores, AI documentation, and reverse ETL for insights. Its AI copilot allows building pipelines using natural language for augmented data engineering.

For more examples of Matillion's AI components in action, check out our library of AI Videos and Demos.

To try Matillion yourself, using your own data, sign up for a free trial.

If you are already a Matillion user or trial customer, you can download the sentiment analysis example shown in the screenshots earlier, and run it on your own platform.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.