Meet Maia: The AI Data Automation platform that gives you the freedom to do more.

Visit maia.ai

Sentiment Analysis in Amazon Redshift with Meta Llama3 70B using Amazon Bedrock

Sentiment analysis is a powerful technique that allows organizations to gain valuable insights from unstructured data sources, such as product reviews, social media posts, and customer feedback. In this article, we will explore various methods for performing sentiment analysis on data stored in Amazon Redshift, leveraging the capabilities of the Meta Llama3 70B language model through Amazon Bedrock.

We will begin by demonstrating how to use Python to extract and preprocess text data from Redshift, preparing it for sentiment analysis. Subsequently, we will introduce the Meta Llama3 70B model and explain its architecture, strengths, and potential applications in the context of sentiment analysis. Before delving into the implementation details, we will provide a comprehensive overview of sentiment analysis, its importance, and the different approaches commonly employed in this field.

What is Sentiment Analysis?

Sentiment Analysis is a process that transforms unstructured text into numeric sentiment scores, typically on a predefined scale, indicating positive, negative, or neutral sentiments. This technique leverages Natural Language Processing (NLP) to parse and extract the underlying emotional tone from text data, making it invaluable for applications such as social media monitoring, customer feedback analysis, and market research.

Large Language Models (LLMs) like Meta's Llama3 70B have propelled the effectiveness of Sentiment Analysis. These models, trained on vast corpora, understand nuanced language patterns, context, and semantics, enabling them to generate accurate sentiment scores. Through fine-tuning, LLMs adapt to domain-specific sentiment nuances, enhancing their precision and reliability.

Effective sentiment analysis hinges on meticulous data preparation. Data engineers play a pivotal role by interfacing between databases and LLMs, ensuring that text data is clean, normalized, and in a format suitable for model ingestion. This involves handling inconsistencies, managing scalability, and securing data integrity. By preparing data reliably, engineers lay the groundwork for accurate and meaningful sentiment insights, maximizing the value derived from LLMs.

Business examples of Sentiment Analysis

  • Social Media Monitoring: Analyzing sentiment on social media platforms to gauge public opinion on products, services, or brand reputation.
  • Customer Feedback Analysis: Automatically categorizing customer reviews, support tickets, or survey responses to identify areas for improvement or potential issues.
  • Financial Market Prediction: Analyzing sentiment in news articles, financial reports, and social media to predict stock market trends or investment opportunities.

What is Meta Llama3 70B?

Meta Llama3 70B is a large language model developed by Meta AI, a subsidiary of Meta Platforms, Inc. It's a type of transformer-based language model, specifically a decoder-only architecture, with approximately 70 billion parameters. This model is trained on a massive dataset of text from the internet, books, and other sources, allowing it to learn patterns and relationships in language.

Pros:

  • High accuracy and fluency in generating text
  • Ability to understand and respond to complex queries and prompts
  • Can be fine-tuned for specific tasks and domains

Cons:

  • Requires significant computational resources and memory
  • May struggle with out-of-distribution or unseen data
  • Can be prone to generating biased or toxic content if not properly trained or filtered

Ideal use cases:

  • Conversational AI and chatbots
  • Text generation and summarization
  • Language translation and localization
  • Content creation and writing assistance
  • Question answering and knowledge retrieval systems

How to perform Sentiment Analysis in Redshift with Meta Llama3 70B using Python with the Amazon Bedrock SDK

Prerequisites for the boto3 Amazon Bedrock Python SDK

Start by installing the prerequisite libraries:

python3 -m pip install psycopg2-binary boto3

Afterwards load your source data into Redshift.

Python boto3 for Meta Llama3 70B

The example below involves product reviews, and assumes that the source data has been loaded into a table named "stg_sample_reviews" with four columns: id (the primary key), stars, product and review.

The Python script is shown below. Note it is good practice to handle credentials more securely than shown in this simple example. You might choose to use a secret management service instead of environment variables or hardcoding.

Also please note that handling large amounts of data using fetchall() can be inefficient, and may result in memory issues. For large datasets, you should use a cursor to fetch rows incrementally using the fetchmany() method instead.

If you are working in a schema other than "public" you will need the -c connection option to specify the object search path. Set your RS_OPTIONS environment variable to "-c search_path=yourSchemaName,public" replacing the schema name with your own. The newly created table will be added to this named schema.

import os
import psycopg2
import logging
import json
import boto3
import botocore
from botocore.exceptions import ClientError

logger = logging.getLogger("demo")

# Use the Amazon Bedrock InvokeModel API
def analyze_sentiment(text):
    abc = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
    model_id = "meta.llama3-70b-instruct-v1:0"

    prompt = f"""<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Analyze the sentiment of the following text and return a score from 1 to 5, where 1 represents the most negative sentiment and 5 represents the most positive sentiment: {text}

Respond with a single number only. Do not include any notes, justification, explanation or confidence level, just the number.
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

    body = json.dumps({"prompt": prompt, "top_p": 0.9, "temperature": 0.5})
    response = abc.invoke_model(body=body, modelId=model_id, accept='application/json', contentType='application/json')
    response_body = json.loads(response.get('body').read())
    return response_body.get('generation').strip()

# Database connection parameters
db_params = {
    'dbname': os.environ["RS_DBNAME"],
    'user': os.environ["RS_USER"],
    'password': os.environ["RS_PASSWORD"],
    'host': os.environ["RS_HOST"],
    'port': os.environ["RS_PORT"],
    'options': os.environ["RS_OPTIONS"]
}

# Create a connection to the Redshift database
try:
    conn = psycopg2.connect(**db_params)
except Exception as e:
    print(f"Unable to connect to the database: {e}")
    exit(1)

conn.autocommit = True

# Create a cursor object
cur = conn.cursor()

# Fetch all the rows from the source table
query = f'SELECT "id", "review" FROM "stg_sample_reviews"'

try:
    # Create the table to hold the results, if it does not already exist
    cur.execute('''CREATE TABLE IF NOT EXISTS "stg_sample_reviews_genai"
 ("id" INT NOT NULL, "ai_score" VARCHAR(1024) NOT NULL)''')

    cur.execute(f'DELETE FROM "stg_sample_reviews_genai"')

    # Execute the query
    cur.execute(query)

    # Fetch and process each row
    for row in cur.fetchall():
        ai_score = analyze_sentiment(row)
        cur.execute(f'INSERT INTO "stg_sample_reviews_genai" ("id", "ai_score") VALUES ({row[0]}, {ai_score})')

except Exception as e:
    print(f"SQL error: {e}")
finally:
    cur.close()
    conn.close()

After running the above script, you should find a new table has been created, which contains the AI-generated review score for every input record. Join this table to the original on the common id column to compare the AI-generated sentiment scores against the original star review.

The LLM was asked to score between 1 and 5, so you may choose to classify the scores more broadly as follows:

  • 4 or 5 - Positive
  • 3 - Neutral
  • 1 or 2 - Negative

Sentiment Analysis in Redshift using Matillion to run Meta Llama3 70B via Amazon Bedrock

In the Matillion Data Productivity Cloud, orchestration pipelines like the one shown in the screenshot below can:

  • Directly extract and load data, or call other pipelines to do so (as shown)
  • Invoke Meta Llama3 70B, with a nominated prompt, against all rows from a nominated table

Sentiment Analysis in Redshift using Matillion

Data pipelines such as this manage all the connectivity and plumbing between the Redshift source and target tables, and the LLM.

This allows you to focus on the overall design and architecture, and the data analysis. To compare the AI-generated sentiment scores against the original star review, use a transformation pipeline like the one in the next screenshot.

Checking the results of Sentiment Analysis in Redshift using Matillion

The data sample shows two of the records. In one case the LLM's decision matches the original sentiment identically, but in the other record the ratings differ slightly. This is an example of the subjective nature of sentiment analysis.

Summary

Matillion empowers data teams to build and manage data pipelines rapidly for AI and analytics at scale. It integrates seamlessly with cloud providers, customer data platforms, large language models, and more. With a user-friendly interface offering pre-built components or coding options (SQL, Python, dbt), Matillion democratizes access to AI capabilities.

Its Git integration facilitates asynchronous collaboration, while AI-generated documentation enhances productivity. Matillion provides numerous no-code connectors, REST API connectivity, parameterization via variables, and hybrid SaaS deployment. Its data lineage, pushdown ELT, vector store connectivity, and AI components streamline AI data integration.

Overall, Matillion's augmented data engineering approach, including natural language pipeline building, boosts productivity and collaboration across all skill levels.

For more examples of Matillion's AI components in action, check out our library of AI Videos and Demos.

To try Matillion yourself, using your own data, sign up for a free trial.

If you are already a Matillion user or trial customer, you can download the sentiment analysis example shown in the screenshots earlier, and run it on your own platform.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.