- Blog
- 06.24.2024
- Data Fundamentals, Product
Sentiment Analysis in Databricks with Mistral 7B Instruct using Amazon Bedrock

Sentiment Analysis is a powerful technique for understanding the emotional tone and sentiment behind textual data, making it a valuable tool for businesses seeking to gain insights from customer feedback, social media conversations, and other text-based sources.
This article will guide you through various approaches to performing Sentiment Analysis on textual data using Python, leveraging the capabilities of Databricks and the cutting-edge Mistral 7B Instruct language model through Amazon Bedrock.
We will begin by exploring the fundamentals of Sentiment Analysis, delving into its applications and the underlying methodologies employed to extract sentiment from text. Subsequently, we will introduce Mistral 7B Instruct, a state-of-the-art natural language processing model developed by Amazon, and discuss its integration with Databricks through the Amazon Bedrock platform.
What is Sentiment Analysis?
Sentiment Analysis is a powerful technique that extracts a numeric sentiment score from unstructured text, providing valuable insights into subjective information such as opinions and emotions. By converting qualitative data into quantitative metrics, it enables businesses to more effectively gauge public sentiment, customer opinions, and other textual data sources at scale.
Leveraging large language models (LLMs), such as Mistral 7B Instruct, enhances Sentiment Analysis by utilizing their advanced natural language understanding capabilities. These models are trained on vast datasets and can discern nuances in text, making them highly effective for sentiment scoring. LLMs can parse complex language structures and detect subtleties in sentiment, outperforming traditional methods.
A crucial aspect of Sentiment Analysis is data preparation. Data engineers play a pivotal role in ensuring that data is clean, relevant, and accessible. They must interface between the database and the LLM, orchestrating data pipelines that transform raw text into structured formats suitable for analysis. This involves tasks such as data extraction, noise reduction, tokenization, and ensuring consistency across the dataset, enabling the LLM to perform sentiment analysis accurately and reliably.
Business examples of Sentiment Analysis
- Social media monitoring: Analyzing customer sentiments towards products, services, or brands.
- Customer feedback analysis: Categorizing customer reviews, comments, or surveys for product improvement or customer support.
- Financial market analysis: Predicting stock market trends based on sentiment analysis of news articles, social media, or financial reports.
What is Mistral 7B Instruct?
The Mistral 7B Instruct is a large language model developed by Mistral AI, a leading AI research lab based in Paris, France. It is a transformer-based model, which means it uses a self-attention mechanism to process input data and generate outputs. The "7B" in its name refers to the number of parameters it has, making it one of the larger language models currently available.
Mistral 7B Instruct is designed for instructional text understanding and generation tasks. It can be fine-tuned on specific datasets to provide accurate and relevant responses to various prompts, making it ideal for applications such as customer support, educational content generation, and programming assistance.
Pros:
- Large parameter size allows for more accurate and nuanced responses
- Transformer architecture enables effective handling of complex instructions
- Can be fine-tuned on specific datasets for improved performance
Cons:
- Requires significant computational resources for training and inference
- May generate incorrect or irrelevant responses if not properly fine-tuned
- Limited to text-based inputs and outputs
Ideal use cases:
- Customer support: Providing accurate and personalized responses to customer inquiries
- Education: Generating educational content tailored to specific learning objectives
- Programming assistance: Helping users write code by providing suggestions and explanations.
How to perform Sentiment Analysis in Databricks with Mistral 7B Instruct using Python with the Amazon Bedrock SDK
Prerequisites for the boto3 Amazon Bedrock Python SDK
Start by installing the prerequisite libraries (note the Databricks SDK for Python is in beta, version 0.28.0, at the time of writing):
python3 -m pip install databricks-sdk boto3
Then load your source data into Databricks.
Python boto3 for Mistral 7B Instruct and Databricks SDK
The example below involves product reviews, and assumes that the source data has been loaded into a table named "stg_sample_reviews" with four columns: id (the primary key), stars, product and review.
The Python script is shown below. Note it is good practice to handle credentials more securely than shown in this simple example. You might choose to use a secret management service instead of environment variables or hardcoding.
import os
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.sql import StatementParameterListItem
import re
import logging
import json
import boto3
import botocore
from botocore.exceptions import ClientError
logger = logging.getLogger("demo")
# Use the Amazon Bedrock Converse API
def analyze_sentiment(text):
abc = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
model_id = "mistral.mistral-7b-instruct-v0:2"
prompt = f"""Analyze the given text from an online review and determine the sentiment score. Return a single number between 1 and 5, with 1 being the most negative sentiment and 5 being the most positive sentiment. No further explanation or justification is required.
Text: {text}
Respond with a single number only. Do not include any notes, justification, explanation or confidence level, just the number.
"""
response = abc.converse(modelId = model_id,
messages = [{"role": "user", "content": [{"text": prompt}]}],
inferenceConfig = {"temperature": 0.3},
additionalModelRequestFields = {"top_k": 200}
)
return re.sub(r'[\s\.].*', '', response['output']['message']['content'][0]['text'].strip())
sqlWhId = os.environ["DBKS_SQL_WHID"]
vCatalog = os.environ["DBKS_CATALOG"]
vSchema = os.environ["DBKS_SCHEMA"]
# Connect to workspace
wsclient = WorkspaceClient(
host = os.environ["DBKS_HOSTURL"],
token = os.environ["DBKS_TOKEN"]
)
ddl = wsclient.statement_execution.execute_statement(warehouse_id=sqlWhId,
catalog=vCatalog,
schema=vSchema,
statement=f"CREATE OR REPLACE TABLE `stg_sample_reviews_genai` (`id` INT NOT NULL, `ai_score` VARCHAR(1024) NOT NULL)")
# Fetch rows from the table
xsr = wsclient.statement_execution.execute_statement(warehouse_id=sqlWhId,
catalog=vCatalog,
schema=vSchema,
statement="SELECT `id`, `review` FROM `stg_sample_reviews`")
for r in xsr.result.data_array:
ai_score = analyze_sentiment(r[1])
print(f"ID {r[0]}: Score: {ai_score}")
dml = wsclient.statement_execution.execute_statement(warehouse_id=sqlWhId,
catalog=vCatalog,
schema=vSchema,
statement=f"INSERT INTO `stg_sample_reviews_genai` (`id`, `ai_score`) VALUES (:id, :ai_score)",
parameters=[ StatementParameterListItem.from_dict({"name": "id", "type":"INT", "value": r[0]}),
StatementParameterListItem.from_dict({"name": "ai_score", "type":"INT", "value": ai_score}) ])
After running the above script, you should find a new table has been created, which contains the AI-generated review score for every input record. Join this table to the original on the common id column to compare the AI-generated sentiment scores against the original star review.
The LLM was asked to score between 1 and 5, so you may choose to classify the scores more broadly as follows:
- 4 or 5 - Positive
- 3 - Neutral
- 1 or 2 - Negative
Sentiment Analysis in Databricks using Matillion to run Mistral 7B Instruct via Amazon Bedrock
In the Matillion Data Productivity Cloud, orchestration pipelines like the one shown in the screenshot below can:
- Directly extract and load data, or call other pipelines to do so (as shown)
- Invoke Mistral 7B Instruct, with a nominated prompt, against all rows from a nominated table
Sentiment Analysis in Databricks using Matillion
Data pipelines such as this manage all the connectivity and plumbing between the Databricks source and target tables, and the LLM.
This allows you to focus on the overall design and architecture, and the data analysis. To compare the AI-generated sentiment scores against the original star review, use a transformation pipeline like the one in the next screenshot.
Checking the results of Sentiment Analysis in Databricks using Matillion
The data sample shows two of the records. In one case the LLM's decision matches the original sentiment identically, but in the other record the ratings differ slightly. This is an example of the subjective nature of sentiment analysis.
Summary
Matillion is a data pipeline platform that empowers data teams to build and manage pipelines efficiently for AI and analytics at scale. It offers a code-optional approach, fostering productivity and collaboration. Matillion integrates seamlessly with hyperscalers, CDPs, LLMs, and more, democratizing access to AI. Its UI features pre-built components, but users can also code in SQL, Python, or DBT if preferred.
Matillion boasts first-class Git integration, AI-generated documentation, numerous no-code connectors, and the ability to build custom REST API connectors. It allows for parameterization using variables and offers pushdown ELT capabilities. Matillion's AI components, including Generative AI prompting, RAG, and vector store connectivity, facilitate AI integration. It provides data lineage for AI and enables reverse ETL of AI-generated insights. Additionally, Matillion Copilot empowers users to build data pipelines using natural language, fostering augmented data engineering.
For more examples of Matillion's AI components in action, check out our library of AI Videos and Demos.
To try Matillion yourself, using your own data, sign up for a free trial.
If you are already a Matillion user or trial customer, you can download the sentiment analysis example shown in the screenshots earlier, and run it on your own platform.
Featured Resources
Big Data London 2025: Key Takeaways and Maia Highlights
There’s no doubt about it – Maia dominated at Big Data London. Over the two-day event, word spread quickly about Maia’s ...
BlogSay Hello to Ask Matillion, Your New AI Assistant for Product Answers
We’re excited to introduce a powerful new addition to the Matillion experience: Ask Matillion.
BlogRethinking Data Pipeline Pricing
Discover how value-based data pipeline pricing improves ROI, controls costs, and scales data processing without billing surprises.
Share: