- Blog
- 06.24.2024
- Data Fundamentals
Sentiment Analysis in Snowflake with Mistral 7B Instruct using Amazon Bedrock

Sentiment Analysis is a powerful technique that enables organizations to gain valuable insights from unstructured data sources, such as customer reviews, social media posts, and survey responses.
This article will guide you through various approaches to performing Sentiment Analysis in Snowflake, starting with Python-based solutions. We'll explore Mistral 7B Instruct, a state-of-the-art natural language processing model, and begin by providing an overview of Sentiment Analysis itself. By harnessing the capabilities of Mistral 7B Instruct through Amazon Bedrock, we'll demonstrate how to effectively analyze and extract sentiment from large datasets, empowering data-driven decision-making processes.
What is Sentiment Analysis?
Sentiment Analysis is a process that quantifies feelings expressed within unstructured text by assigning a numeric sentiment score, often on a scale ranging from highly negative to highly positive. This enables data-driven insights into user opinions and customer feedback.
Large Language Models (LLMs), such as Mistral 7B Instruct, play a pivotal role in performing Sentiment Analysis. These models harness extensive neural networks trained on diverse datasets to understand and evaluate the context and nuances of human language, resulting in highly accurate sentiment extraction.
Reliable data preparation is paramount for effective Sentiment Analysis. Data engineers must curate vast corpora of text, ensuring they are properly cleaned, tokenized, and normalized before interfacing with the LLM. They act as the crucial bridge between vast, often noisy databases and sophisticated language models. Ensuring that the data pipeline is robust and efficient is vital, as even minor inconsistencies can markedly affect the quality and reliability of the sentiment scores produced.
Business examples of Sentiment Analysis
- Customer Feedback Analysis: Analyze customer reviews, social media mentions, and support interactions to gauge sentiment towards products or services, enabling data-driven decision-making.
- Brand Reputation Monitoring: Track online conversations and sentiment around a brand, enabling proactive reputation management and crisis mitigation.
- Customer Support Interactions: Analyze sentiment from emails, live chat transcripts, and call center recordings to assess customer satisfaction.
What is Mistral 7B Instruct?
The Mistral 7B Instruct is a large language model developed by Mistral AI, a leading AI research lab based in Paris, France. It is a transformer-based model, meaning it uses a self-attention mechanism to understand the context of words in a sentence and generate responses. The "7B" in its name refers to the number of parameters it has, making it one of the larger language models available.
Mistral 7B Instruct is designed to understand and generate human-like text based on given instructions or prompts. It can perform various text-related tasks such as text generation, summarization, translation, and question-answering. It uses a multi-turn dialog capability, allowing it to engage in extended conversations with users.
Pros:
- High-quality text generation and understanding
- Multi-turn dialog capability
- Versatility in performing various text-related tasks
- Can generate text in multiple languages
Cons:
- May sometimes provide incorrect or inappropriate responses
- May not fully understand complex instructions or context
- May require significant computational resources for extended conversations
Ideal use cases:
- Customer service and support chatbots
- Content generation for marketing and advertising
- Language translation services
- Automated responses for frequently asked questions
- Assisting in writing and editing tasks.
How to perform Sentiment Analysis in Snowflake with Mistral 7B Instruct using Python with the Amazon Bedrock SDK
Prerequisites for the boto3 Amazon Bedrock Python SDK
Start by installing the prerequisite libraries
python3 -m pip install snowflake-connector-python boto3
Afterwards load your source data into Snowflake.
Python boto3 for Mistral 7B Instruct
The example below involves product reviews, and assumes that the data has been loaded into a database table named "stg_sample_reviews" with four columns: id (the primary key), stars, product and review.
The Python script is shown below. Note it is good practice to handle credentials more securely than shown in this simple example. You might choose to use a secret management service instead of environment variables or hardcoding.
Also please note that handling large amounts of data using fetchall() can be inefficient, and may result in memory issues. For large datasets, you should use a cursor to fetch rows incrementally using the fetchmany() method instead.
import os
import re
import snowflake.connector
import logging
import json
import boto3
import botocore
from botocore.exceptions import ClientError
logger = logging.getLogger("demo")
# Use the Amazon Bedrock Converse API
def analyze_sentiment(text):
abc = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
model_id = "mistral.mistral-7b-instruct-v0:2"
prompt = f"""Analyze the given text from an online review and determine the sentiment score. Return a single number between 1 and 5, with 1 being the most negative sentiment and 5 being the most positive sentiment. No further explanation or justification is required.
Text: {text}
Respond with a single number only. Do not include any notes, justification, explanation or confidence level, just the number.
"""
response = abc.converse(modelId = model_id,
messages = [{"role": "user", "content": [{"text": prompt}]}],
inferenceConfig = {"temperature": 0.3},
additionalModelRequestFields = {"top_k": 200}
)
return re.sub(r'[\s\.].*', '', response['output']['message']['content'][0]['text'].strip())
# Establish a Snowflake connection
conn = snowflake.connector.connect(
user=os.environ["SF_USER"],
password=os.environ["SF_PASSWORD"],
account=os.environ["SF_ACCOUNT"],
warehouse=os.environ["SF_WH"],
database=os.environ["SF_DB"],
schema=os.environ["SF_SCHEMA"],
role=os.environ["SF_ROLE"]
)
try:
# Create a cursor object using the connection
cur = conn.cursor()
# Create the destination table
cur.execute(f'CREATE OR REPLACE TABLE "stg_sample_reviews_genai" ("id" NUMBER(6,0) NOT NULL, "ai_score" VARCHAR(1024) NOT NULL)')
# Select source rows from the table
cur.execute('SELECT "id", "review" FROM "stg_sample_reviews"')
# Fetch all rows from the executed query
rows = cur.fetchall()
# Loop through the fetched rows and call the analyze_sentiment function
for row in rows:
ai_score = analyze_sentiment(row[1])
cur.execute(f'INSERT INTO "stg_sample_reviews_genai" ("id", "ai_score") VALUES ({row[0]}, {ai_score})')
finally:
# Close the cursor and connection
if cur:
cur.close()
if conn:
conn.close()
After running the above script, you should find a new table has been created, which contains the AI-generated review score for every input record. Join this table to the original on the common id column to compare the AI-generated sentiment scores against the original star review.
The LLM was asked to score between 1 and 5, so you may choose to classify the scores more broadly as follows:
- 4 or 5 - Positive
- 3 - Neutral
- 1 or 2 - Negative
Sentiment Analysis in Snowflake using Matillion to run Mistral 7B Instruct via Amazon Bedrock
In the Matillion Data Productivity Cloud, orchestration pipelines like the one shown in the screenshot below can:
- Directly extract and load data, or call other pipelines to do so (as shown)
- Invoke Mistral 7B Instruct, with a nominated prompt, against all rows from a nominated table
Sentiment Analysis in Snowflake using Matillion
Data pipelines such as this manage all the connectivity and plumbing between the Snowflake source and target tables, and the LLM.
This allows you to focus on the overall design and architecture, and the data analysis. To compare the AI-generated sentiment scores against the original star review, use a transformation pipeline like the one in the next screenshot.
Checking the results of Sentiment Analysis in Snowflake using Matillion
The data sample shows two of the records. In one case, the LLM's decision matches the original sentiment identically, but in the other record, the ratings differ slightly. This is an example of the subjective nature of sentiment analysis.
Summary
Matillion is a data pipeline platform that enables data teams to build and manage pipelines rapidly for AI and analytics at scale. It offers a code-optional UI with pre-built components, or users can code in SQL, Python, or dbt. Matillion integrates with cloud platforms, CDPs, LLMs, and provides AI-generated documentation. It has no-code connectors, REST API connectivity, parameterization with variables, and hybrid SaaS deployment.
Matillion supports data lineage, pushdown ELT, AI components for prompting and vector stores, reverse ETL for insights, and a natural language copilot. Its unified platform handles pipeline orchestration complexity, enables unlimited scaling, and brings AI capabilities to augment data engineering workflows.
For more examples of Matillion's AI components in action, check out our library of AI Videos and Demos.
To try Matillion yourself, using your own data, sign up for a free trial.
If you are already a Matillion user or trial customer, you can download the sentiment analysis example shown in the screenshots earlier, and run it on your own platform.
Featured Resources
What Is Massively Parallel Processing (MPP)? How It Powers Modern Cloud Data Platforms
Massively Parallel Processing (often referred to as simply MPP) is the architectural backbone that powers modern cloud data ...
BlogETL and SQL: How They Work Together in Modern Data Integration
Explore how SQL and ETL power modern data workflows, when to use SQL scripts vs ETL tools, and how Matillion blends automation ...
WhitepapersUnlocking Data Productivity: A DataOps Guide for High-performance Data Teams
Download the DataOps White Paper today and start building data pipelines that are scalable, reliable, and built for success.
Share: