- Blog
- 08.09.2024
- Data Fundamentals, Leveraging AI, Product
Sentiment Analysis in Snowflake with OpenAI GPT-4

This article covers the practical implementation of sentiment analysis within Snowflake, using the advanced capabilities of OpenAI GPT-4. We'll explore various techniques for conducting sentiment analysis, including detailed Python examples to guide you through each step of the process.
The article will begin with a clear explanation of sentiment analysis, establishing a foundational understanding before introducing the functionalities of OpenAI GPT-4 and demonstrating how to integrate these tools effectively within the Snowflake environment.
What is Sentiment Analysis?
Sentiment Analysis leverages large language models (LLMs) to evaluate and categorize the emotional tone of textual data. By utilizing vast amounts of preprocessed language data, LLMs can discern sentiment intricacies ranging from positive and negative to neutral. This is crucial for interpreting customer feedback, social media interactions, or any text-based communication at scale. The robust algorithms within these models can comprehend context, idioms, and even nuanced expressions, providing high accuracy in sentiment categorization and offering valuable insights for data-driven decision-making.
Examples of Sentiment Analysis Applications:
- Social Media Monitoring: Analyzing sentiments in tweets to gauge public opinion on product launches or political events.
- Customer Feedback Analysis: Parsing reviews and survey responses to determine customer satisfaction and identify areas for improvement.
- Market Sentiment Evaluation: Interpreting investor sentiments from financial news articles and forums to predict stock market trends.
What is OpenAI GPT-4?
OpenAI GPT-4, leveraging deep learning and transformer architectures, demonstrates profound advancements in natural language understanding and generation. Key improvements include enhanced contextual awareness and more efficient scaling, making it invaluable for generating synthetic data and automating ETL pipelines. Its ability to process and generate large volumes of coherent, contextually rich text allows data engineers to streamline data annotation processes and improve model training datasets. For data engineers, GPT-4's sophisticated handling of semi-structured and unstructured data simplifies schema design and facilitates better data integration across disparate systems.
GPT-4 represents a significant advancement in generative AI, offering numerous versatile applications, but it comes with both advantages and drawbacks. One of the primary benefits is its ability to understand and generate human-like text, making it highly useful for tasks such as drafting content, coding assistance, and providing conversational agents. Its extensive training on diverse datasets allows it to produce coherent and contextually relevant responses. However, GPT-4 also has limitations, including occasional inaccuracies, the potential for generating biased or inappropriate content, and substantial computational resource requirements, which can be costly. It is best utilized in scenarios where creative, yet reliable, text generation is required and where its outputs can be appropriately reviewed and curated before final use, such as in content creation or preliminary data analysis.
How to perform Sentiment Analysis in Snowflake with OpenAI GPT-4 using Python
Start by installing the prerequisite libraries:
python3 -m pip install snowflake-connector-python openai
Then, load your source data into Snowflake. The example below involves product reviews, and the data has been loaded into a table named "stg_sample_reviews" with four columns: id (the primary key), stars, product, and review.
Here is the Python script. Note it is good practice to handle credentials more securely than shown in this simple example. You might choose to use environment variables or a secret management service instead of hardcoding them.
Please note that handling large amounts of data using fetch all () can be inefficient and may result in memory issues. For large datasets, you should use a cursor to fetch rows incrementally using the fetch many () method instead.
To authenticate with OpenAI you will need to set up an environment variable named OPENAI_API_KEY containing your API key.
import snowflake.connector
from openai import OpenAI
modelname = "gpt-4"
systemPrompt = "Your job is to analyze online product reviews"
# Function that will be called for every input row
def process_row(row):
print(f"Processing id: {row[0]}")
userPrompt = f"""Provide a numeric rating that reflects the overall sentiment of the review.
The rating should be a single number between 1 and 5, where 1 represents the most negative sentiment and 5 represents the most positive sentiment.
Respond with the numeric rating only. Do not include any justification of the rating.
review: {row[1]}"""
ccresp = oaisdk.chat.completions.create(
model=modelname,
temperature=1,
messages=[ {"role": "system", "content": systemPrompt},
{"role": "user", "content": userPrompt} ])
msg = ccresp.choices[0].message.content.strip()
return(msg)
# Establish a Snowflake connection
conn = snowflake.connector.connect(
user=...,
password=...,
account=...,
warehouse=...,
database=...,
schema=...,
role=...
)
# Establish an OpenAI connection using environment variable OPENAI_API_KEY
oaisdk = OpenAI()
try:
# Create a cursor object using the connection
cur = conn.cursor()
# Create the destination table
cur.execute(f'CREATE TABLE IF NOT EXISTS "stg_sample_reviews_{modelname}" ("id" NUMBER(6,0) NOT NULL, "ai_score" VARCHAR(1024) NOT NULL)')
cur.execute(f'DELETE FROM "stg_sample_reviews_{modelname}"')
# Select source rows from the table
cur.execute('SELECT "id", "review" FROM "stg_sample_reviews"')
# Fetch all rows from the executed query
rows = cur.fetchall()
# Loop through the fetched rows and call the process_row function
for row in rows:
ai_score = process_row(row)
cur.execute(f'INSERT INTO "stg_sample_reviews_{modelname}" ("id", "ai_score") VALUES ({row[0]}, {ai_score})')
finally:
# Close the cursor and connection
if cur:
cur.close()
if conn:
conn.close()
After running the above script, you should find a new table has been created, which contains the AI-generated review score for every input record. Join this table to the original on the common "id" column to compare the AI-generated sentiment scores against the original star review.
The LLM was asked to score between 1 and 5, so you may choose to classify the scores more broadly as follows:
- 4 or 5 - Positive
- 3 - Neutral
- 1 or 2 - Negative
Sentiment Analysis in Snowflake using Matillion
In the Matillion Data Productivity Cloud, orchestration pipelines like the one shown in the screenshot below can:
- Directly extract and load data or call other pipelines to do so (as shown)
- Invoke OpenAI GPT-4, with a nominated prompt, against all rows from a nominated table
Performing Sentiment Analysis in Snowflake using Matillion
Data pipelines such as this manage all the connectivity and plumbing between the Databricks source and target tables, and the LLM.
This allows you to focus on the overall design and architecture, and the data analysis. To compare the AI-generated sentiment scores against the original star review, use a transformation pipeline like the one in the next screenshot.
Checking the results of Sentiment Analysis in Snowflake using Matillion
The data sample shows two of the records. In one case, the LLM's decision matches the original sentiment identically, but in the other record, the ratings differ slightly. This is an example of the subjective nature of sentiment analysis.
Summary
Matillion is a robust data pipeline platform that enhances developer productivity and collaboration by enabling accelerated pipeline development and management for AI and analytics. It stands out with its integrated support for hyperscalers, CDPs, LLMs, and REST APIs. The platform features a versatile UI packed with pre-built components, no-code connectors, and first-class Git integration for asynchronous work. Matillion offers pushdown ELT, hybrid SaaS deployment, and extensive data lineage capabilities. Users can code optionally in SQL, Python, or DBT and leverage AI components and vector store connectivity. The platform's powerful AI-driven tools, like Matillion copilot, allow natural language-based pipeline construction, enriching augmented data engineering workflows and ensuring scalability and efficiency.
For more examples of Matillion's AI components in action, check out our library of AI Videos and Demos.
To try Matillion yourself, using your own data, sign up for a free trial.
If you are already a Matillion user or trial customer, you can download the sentiment analysis example shown in the screenshots earlier and run it on your own platform.
Ian Funnell
Data Alchemist
Ian Funnell, Data Alchemist at Matillion, curates The Data Geek weekly newsletter and manages the Matillion Exchange.
Follow Ian on LinkedIn: https://www.linkedin.com/in/ianfunnell
Featured Resources
Big Data London 2025: Key Takeaways and Maia Highlights
There’s no doubt about it – Maia dominated at Big Data London. Over the two-day event, word spread quickly about Maia’s ...
BlogSay Hello to Ask Matillion, Your New AI Assistant for Product Answers
We’re excited to introduce a powerful new addition to the Matillion experience: Ask Matillion.
BlogRethinking Data Pipeline Pricing
Discover how value-based data pipeline pricing improves ROI, controls costs, and scales data processing without billing surprises.
Share: