- Blog
- 08.20.2024
- Leveraging AI, Data Fundamentals, Product
Sentiment Analysis in Snowflake with OpenAI GPT-4o

This article provides a comprehensive guide to conducting Sentiment Analysis in Snowflake using OpenAI's GPT-4o. We will demonstrate various techniques, including using Python, to effectively carry out sentiment analysis on your data.
Additionally, the article will detail the capabilities of OpenAI's GPT-4o, and will begin by providing an in-depth explanation of Sentiment Analysis to ensure a clear understanding of the foundational concepts.
What is Sentiment Analysis?
Sentiment Analysis using large language models (LLMs) leverages advanced natural language processing (NLP) techniques to automatically determine the emotional tone behind textual data. By utilizing transformer architectures, such as those found in models like GPT-4o, these systems can interpret nuanced sentiment signals from various sources, ranging from social media posts to customer reviews.
LLMs can perform sentiment analysis by providing context-aware interpretations, improving accuracy, and enabling more granular sentiment categorization beyond mere positive, negative, or neutral classifications.
Key stages in implementing sentiment analysis involve data preprocessing, tokenization, model training with domain-specific datasets, and fine-tuning to capture intricate sentiment variations.
Examples of Sentiment Analysis using LLMs:
- Social Media Monitoring: Analyzing Twitter feeds to gauge public sentiment towards major product launches or brand-related events.
- Customer Feedback Analysis: Processing large volumes of customer reviews on e-commerce platforms to ascertain product satisfaction and identify potential areas for improvement.
- Financial Market Sentiment: Evaluating news articles and financial reports to predict market movements based on collective investor sentiment.
What is OpenAI GPT-4o?
OpenAI GPT-4o represents a significant leap in generative AI capabilities tailored for scalable and efficient cloud database integration. Featuring an optimized transformer architecture, GPT-4o achieves lower latency and higher throughput, essential for real-time data stream processing. Enhanced contextual understanding allows for more accurate natural language queries and automation of ETL pipelines. Its advanced fine-tuning on domain-specific data ensures precise responses, vital for data normalization and anomaly detection.
GPT-4o offers remarkable improvements in natural language understanding and generation, making it ideal for a wide array of applications such as customer service bots, content creation, and data analysis. Its strengths lie in its ability to generate human-like responses, comprehend context more effectively, and handle complex queries with improved accuracy. However, its cons include potential biases in generated content, reliance on large computational resources, and occasional production of plausible but incorrect or nonsensical answers. GPT-4o is best used in scenarios where nuanced language understanding is crucial, such as drafting detailed reports, generating creative content, or providing sophisticated conversational interfaces while being mindful of the need for oversight to ensure accuracy and mitigate bias.
How to perform Sentiment Analysis in Snowflake with OpenAI GPT-4o using Python
Start by installing the prerequisite libraries:
python3 -m pip install snowflake-connector-python openai
Then load your source data into Snowflake. The example below involves product reviews, and the data has been loaded into a table named "stg_sample_reviews" with four columns: id (the primary key), stars, product and review.
Here is the Python script. Note it is good practice to handle credentials more securely than shown in this simple example. You might choose to use environment variables or a secret management service instead of hardcoding them.
Also please note that handling large amounts of data using fetchall() can be inefficient, and may result in memory issues. For large datasets, you should use a cursor to fetch rows incrementally using the fetchmany() method instead.
To authenticate with OpenAI you will need to set up an environment variable named OPENAI_API_KEY containing your API key.
import snowflake.connector
from openai import OpenAI
modelname = "gpt-4o"
systemPrompt = "Your job is to analyze online product reviews"
# Function that will be called for every input row
def process_row(row):
print(f"Processing id: {row[0]}")
userPrompt = f"""Provide a numeric rating that reflects the overall sentiment of the review.
The rating should be a single number between 1 and 5, where 1 represents the most negative sentiment and 5 represents the most positive sentiment.
Respond with the numeric rating only. Do not include any justification of the rating.
review: {row[1]}"""
ccresp = oaisdk.chat.completions.create(
model=modelname,
temperature=1,
messages=[ {"role": "system", "content": systemPrompt},
{"role": "user", "content": userPrompt} ])
msg = ccresp.choices[0].message.content.strip()
return(msg)
# Establish a Snowflake connection
conn = snowflake.connector.connect(
user=...,
password=...,
account=...,
warehouse=...,
database=...,
schema=...,
role=...
)
# Establish an OpenAI connection using environment variable OPENAI_API_KEY
oaisdk = OpenAI()
try:
# Create a cursor object using the connection
cur = conn.cursor()
# Create the destination table
cur.execute(f'CREATE TABLE IF NOT EXISTS "stg_sample_reviews_{modelname}" ("id" NUMBER(6,0) NOT NULL, "ai_score" VARCHAR(1024) NOT NULL)')
cur.execute(f'DELETE FROM "stg_sample_reviews_{modelname}"')
# Select source rows from the table
cur.execute('SELECT "id", "review" FROM "stg_sample_reviews"')
# Fetch all rows from the executed query
rows = cur.fetchall()
# Loop through the fetched rows and call the process_row function
for row in rows:
ai_score = process_row(row)
cur.execute(f'INSERT INTO "stg_sample_reviews_{modelname}" ("id", "ai_score") VALUES ({row[0]}, {ai_score})')
finally:
# Close the cursor and connection
if cur:
cur.close()
if conn:
conn.close()
After running the above script, you should find a new table has been created, which contains the AI-generated review score for every input record. Join this table to the original on the common "id" column to compare the AI-generated sentiment scores against the original star review.
The LLM was asked to score between 1 and 5, so you may choose to classify the scores more broadly as follows:
- 4 or 5 - Positive
- 3 - Neutral
- 1 or 2 - Negative
Sentiment Analysis in Snowflake using Matillion
In the Matillion Data Productivity Cloud, orchestration pipelines like the one shown in the screenshot below can:
- Directly extract and load data, or call other pipelines to do so (as shown)
- Invoke OpenAI GPT-4o, with a nominated prompt, against all rows from a nominated table
Performing Sentiment Analysis in Snowflake using Matillion
Data pipelines such as this manage all the connectivity and plumbing between the Databricks source and target tables, and the LLM.
This allows you to focus on the overall design and architecture, and the data analysis. To compare the AI-generated sentiment scores against the original star review, use a transformation pipeline like the one in the next screenshot.
Checking the results of Sentiment Analysis in Snowflake using Matillion
The data sample shows two of the records. In one case the LLM's decision matches the original sentiment identically, but in the other record the ratings differ slightly. This is an example of the subjective nature of sentiment analysis.
Summary
Matillion is the advanced data pipeline platform designed for rapid AI and analytics deployment at scale. It boosts productivity with its code-optional approach, offering a UI with pre-built components and the flexibility to use SQL, Python, or DBT. Seamlessly integrating with hyperscalers, CDPs, LLMs, and vector stores, Matillion democratizes AI access and streamlines workflows. Its first-class Git integration and AI-generated documentation promote asynchronous collaboration. The platform supports hybrid SaaS deployment and provides comprehensive data lineage and robust pushdown ELT capabilities. Leveraging no-code connectors and generative AI components, Matillion redefines data engineering productivity and flexibility, enabling efficient reverse ETL and parameterization through variables.
For more examples of Matillion's AI components in action, check out our library of AI Videos and Demos.
To try Matillion yourself, using your own data, sign up for a free trial.
If you are already a Matillion user or trial customer, you can download the sentiment analysis example shown in the screenshots earlier and run it on your own platform.
Ian Funnell
Data Alchemist
Ian Funnell, Data Alchemist at Matillion, curates The Data Geek weekly newsletter and manages the Matillion Exchange.
Follow Ian on LinkedIn: https://www.linkedin.com/in/ianfunnell
Featured Resources
Big Data London 2025: Key Takeaways and Maia Highlights
There’s no doubt about it – Maia dominated at Big Data London. Over the two-day event, word spread quickly about Maia’s ...
Learn more BlogSay Hello to Ask Matillion, Your New AI Assistant for Product Answers
We’re excited to introduce a powerful new addition to the Matillion experience: Ask Matillion.
Learn more BlogRethinking Data Pipeline Pricing
Discover how value-based data pipeline pricing improves ROI, controls costs, and scales data processing without billing surprises.
Learn more
Share: