Sentiment Analysis in Snowflake with Titan Text Express using Amazon Bedrock

Sentiment Analysis is a powerful technique that enables organizations to gain valuable insights from textual data, such as customer reviews, social media posts, and survey responses.

This article will guide you through various approaches to performing Sentiment Analysis in Snowflake, starting with Python. We'll explore how to harness the capabilities of Titan Text Express, a cutting-edge large language model offered through Amazon Bedrock.

First, we'll provide an overview of Sentiment Analysis, explaining its significance and potential applications across different industries.

What is Sentiment Analysis?

Sentiment Analysis is a crucial natural language processing (NLP) technique that assigns a numeric sentiment score to unstructured text, quantifying emotions such as positivity, negativity, or neutrality. At the core of Sentiment Analysis are large language models (LLMs), such as Titan Text Express, which have been pre-trained on vast datasets to understand context, semantics, and the subtleties of human language. These models can be fine-tuned to specialize in identifying sentiment in diverse text sources, from social media posts to customer reviews.

Reliable data preparation is critical for effective Sentiment Analysis. Data engineers play a pivotal role, acting as the bridge between raw databases and sophisticated LLMs. They engage in tasks like data cleansing, normalization, and transformation, ensuring that the input text is in a usable format. This involves creating pipelines that efficiently extract, preprocess, and feed text data into the LLM, while also handling issues like missing values and text ambiguities, to ensure robust and accurate sentiment predictions.

Business examples of Sentiment Analysis

  • Social media monitoring: Analyze customer feedback on products or services from social media platforms.
  • Brand reputation management: Track sentiment towards a brand across various online sources.
  • Customer support optimization: Prioritize negative reviews or complaints for prompt resolution.

What is Titan Text Express?

Titan Text Express is a large language model for text generation developed by Amazon. It's designed to help users generate and refine text efficiently, leveraging advanced natural language processing (NLP) techniques. This tool integrates seamlessly into Amazon's suite of services, providing users with easy access to automated content creation and editing features. Titan Text Express utilizes machine learning algorithms to understand context, improve grammar, and enhance writing style.

Pros:

  • Sophisticated NLP capabilities for high-quality text generation.
  • Seamless integration with Amazon's existing services and platforms.
  • Customizable settings for tailored writing outputs.

Cons:

  • Potential for generating contextually inaccurate content.
  • May require ongoing fine-tuning for specific industries.
  • Dependency on AWS infrastructure.

Ideal Use Cases:

  • E-commerce product descriptions.
  • Digital content generation.
  • Automated customer service responses.
  • Conversational chat.
  • Real-time content editing and enhancement.

How to perform Sentiment Analysis in Snowflake with Titan Text Express using Python with the Amazon Bedrock SDK

Prerequisites for the boto3 Amazon Bedrock Python SDK

Start by installing the prerequisite libraries

python3 -m pip install snowflake-connector-python boto3

Afterwards load your source data into Snowflake.

Python boto3 for Titan Text Express

The example below involves product reviews, and assumes that the data has been loaded into a database table named "stg_sample_reviews" with four columns: id (the primary key), stars, product and review.

The Python script is shown below. Note it is good practice to handle credentials more securely than shown in this simple example. You might choose to use a secret management service instead of environment variables or hardcoding.

Also please note that handling large amounts of data using fetchall() can be inefficient, and may result in memory issues. For large datasets, you should use a cursor to fetch rows incrementally using the fetchmany() method instead.

import os
import snowflake.connector
import logging
import json
import boto3
import botocore
from botocore.exceptions import ClientError

logger = logging.getLogger("demo")

# Use the Amazon Bedrock InvokeModel API
def analyze_sentiment(text):
    abc = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
    model_id = "amazon.titan-text-express-v1"
    prompt = f"""Your job is to analyze online product reviews.
Provide a numeric rating that reflects the overall sentiment of the review.
The rating should be a single number between 1 and 5, where 1 represents the most negative sentiment and 5 represents the most positive sentiment.
Respond with only your numeric rating. Do not include any justification of the rating. Do not use the word 'rating'.
Review: {text}
"""
    body = json.dumps({"inputText": prompt, "textGenerationConfig" : {"topP":0.9, "temperature":0.5}})
    response = abc.invoke_model(body=body, modelId=model_id, accept='application/json', contentType='application/json')
    response_body = json.loads(response.get('body').read())
    return response_body.get('results')[0].get('outputText')

# Establish a Snowflake connection
conn = snowflake.connector.connect(
    user=os.environ["SF_USER"],
    password=os.environ["SF_PASSWORD"],
    account=os.environ["SF_ACCOUNT"],
    warehouse=os.environ["SF_WH"],
    database=os.environ["SF_DB"],
    schema=os.environ["SF_SCHEMA"],
    role=os.environ["SF_ROLE"]
)

try:
    # Create a cursor object using the connection
    cur = conn.cursor()

    # Create the destination table
    cur.execute(f'CREATE OR REPLACE TABLE "stg_sample_reviews_genai" ("id" NUMBER(6,0) NOT NULL, "ai_score" VARCHAR(1024) NOT NULL)')

    # Select source rows from the table
    cur.execute('SELECT "id", "review" FROM "stg_sample_reviews"')

    # Fetch all rows from the executed query
    rows = cur.fetchall()

    # Loop through the fetched rows and call the analyze_sentiment function
    for row in rows:
        ai_score = analyze_sentiment(row[1])
        cur.execute(f'INSERT INTO "stg_sample_reviews_genai" ("id", "ai_score") VALUES ({row[0]}, {ai_score})')

finally:
    # Close the cursor and connection
    if cur:
        cur.close()
    if conn:
        conn.close()

After running the above script, you should find a new table has been created, which contains the AI-generated review score for every input record. Join this table to the original on the common id column to compare the AI-generated sentiment scores against the original star review.

The LLM was asked to score between 1 and 5, so you may choose to classify the scores more broadly as follows:

  • 4 or 5 - Positive
  • 3 - Neutral
  • 1 or 2 - Negative

Sentiment Analysis in Snowflake using Matillion to run Titan Text Express via Amazon Bedrock

In the Matillion Data Productivity Cloud, orchestration pipelines like the one shown in the screenshot below can:

  • Directly extract and load data, or call other pipelines to do so (as shown)
  • Invoke Titan Text Express, with a nominated prompt, against all rows from a nominated table

Sentiment Analysis in Snowflake using Matillion

Data pipelines such as this manage all the connectivity and plumbing between the Snowflake source and target tables, and the LLM.

This allows you to focus on the overall design and architecture, and the data analysis. To compare the AI-generated sentiment scores against the original star review, use a transformation pipeline like the one in the next screenshot.

Checking the results of Sentiment Analysis in Snowflake using Matillion

The data sample shows two of the records. In one case the LLM's decision matches the original sentiment identically, but in the other record the ratings differ slightly. This is an example of the subjective nature of sentiment analysis.

Summary

Matillion is a data pipeline platform that empowers teams to build and manage data pipelines rapidly for AI and analytics at scale. It offers a code-optional UI with pre-built components, or users can code in SQL, Python, or dbt. Matillion integrates with cloud providers, customer data platforms, large language models, and more. It democratizes AI access with no-code connectors, REST API connectivity, parameterization, and hybrid SaaS deployment.

Other key features include Git integration, AI-generated documentation, data lineage, pushdown ELT, vector store connectivity, reverse ETL for AI insights, natural language pipeline building via Copilot, and no-code components for generative AI prompting and retrieval-augmented generation.

For more examples of Matillion's AI components in action, check out our library of AI Videos and Demos.

To try Matillion yourself, sign up for a free trial using your own data.

If you are already a Matillion user or trial customer, you can download the sentiment analysis example shown in the screenshots earlier and run it on your own platform.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.