- Blog
- 07.26.2024
- Product, Data Fundamentals, Leveraging AI
Crafting tailored responses with Gen AI: No coding required!

The need to personalize customer interactions using PII (Personally Identifiable Information) data is rising, posing a complex challenge for data teams to balance data processing and sovereignty demands in 2024.
Not only is data sovereignty an increasing concern with the integration of Large Language Models (LLMs), it's also more important than ever to ensure that your data, especially PII, doesn’t leave your cloud infrastructure.
That said, you can address security concerns by using Snowflake Cortex's hosted LLMs to process your unstructured PII data within their secure environment, which is handled with strict privacy protections to ensure you stay compliant.
With that in mind, we'll use this blog to show you how we use PII and non-PII data to analyze reviews and create personalized responses. We'll also explain how separating PII from "safe for sharing" data allows you to leverage LLMs through public internet platforms like OpenAI's ChatGPT 4.0 while processing PII data securely within Snowflake Cortex.
This approach lets us automate the response to reviews, complaints, and praise on a large scale safely without requiring manual intervention, addressing a common challenge many businesses face today.
Let's dive in.
Semi-structured and Unstructured Data Sources
First, we need to obtain our source data, which could include product reviews, service complaints, praise from supporters across social media, and more. This type of data—known as unstructured data—often contains sentiments and useful indicators of how the market perceives your product or service.
These insights can be incredibly valuable if someone takes the time to read through and categorize them. Previously, this was a costly task, requiring either hiring an external agency or using your in-house analytics teams to sift through the data and provide at least a high-level categorization of consumer sentiment regarding a specific product or service.
However, with the rise and adoption of large language models (LLMs) across marketing and product insights teams, this type of analysis no longer needs to be as painstaking and slow as it once was.
Taking it a step further, what if we could automate responses to particularly poor reviews? Imagine review sites where a company responds directly to each concern raised, providing thoughtful attention to detail and highly personalized responses rather than a generic "Thanks for your review; we are sorry to hear about...".
Data sources come in various formats, such as CSV files, Excel files, Google Sheets, APIs, and web scraping. Matillion's integrations with all these types of data sources make it easy to capture inputs from across the business.
I can then load them into my Cloud Data Warehouse. In this case, I'm using Snowflake because of its synergy with Matillion on Cortex. This allows me to leverage PII data within my cloud without risking it being integrated with an external LLM model, a move some consider to be riskier in terms of customer data security.

After loading the data, I use a Data Transfer component that supports various source locations like SFTP, Windows File Share, HTTPS, Blob, and S3. In this case, I'm loading the data into S3, which is our data lake.
Next, I create a VARIANT data table in Snowflake. This native integration leverages Snowflake's capability to store unstructured data, such as free-text reviews, without needing to convert it into structured data beforehand. This initial step is highly beneficial as it allows us to retain all the data without worrying about pre-processing or cleaning during the transfer to our destination.
Now that the data is in Snowflake, I want to make the most of the non-PII data. We will take the full review, stripped of any identifiers (which could be identified and removed via an LLM if we had them), and then input this data into a hosted LLM model with vectorization capabilities.
A brief introduction to Vectorization
Vector databases are powerful tools that allow us to store large amounts of unstructured data in a mathematical model of “closeness.” If you're new to vector databases, there's no need to worry; they are simply handy data types that enable us to group words together based on patterns.
One significant advantage of vector databases is that when integrated with Large Language Models in platforms like Matillion, they can provide insights that LLMs alone cannot deliver. For instance, imagine we have thousands of product reviews stored in a vector database collected over a long period.
When a new set of reviews comes in, or if we scrape new reviews, we can ask the models to identify how common a specific type of review is. For example, if a particular product line or service has an unknown defect, the vector database, containing both historic and new reviews, can help us compare and categorize the data.
In this example, I asked the model to perform three tasks:
- Perform Sentiment Analysis: Provide a sentiment score ranging from -1.0 to 1.0 for each review.
- Identify Categories: Determine which categories the review addresses, such as Performance, Functionality, or Pricing.
- Check Review Frequency: Use the vector database to determine if the type of review is highly common, common, rare, or very rare.
By leveraging both vector databases and LLMs, we can gain deeper insights and more accurately categorize and analyze incoming reviews.
Data Pipelines integrating Vector Databases and LLMs
When this pipeline runs, Matillion coordinates the LLMs and embedded Vector integration for each full-text review, then brings back three responses to me in a semi-structured JSON format.
Looking at the questions above, we're presenting them to the LLMs in a brief and high-level manner. However, the structure of the API calls to the LLMs could allow us to delve into more detail if desired.
For example, we could include at least 50 additional sub-categories for reviews, such as: "Performance: Speed," "Performance: Speed: Too Slow," or "Functionality: Did Not Meet Expectations."
You could even instruct the LLM to use the existing Vector DB to categorize these reviews more accurately based on the new data provided.
Once we receive the responses from the LLM model, we can merge this data with the PII data using an ID or a similar identifier within a Matillion transformation pipeline. At this stage, we can filter out negative sentiment scores to address the reviews that most significantly impact the NPS.
For instance, in this example, I am filtering for scores less than -0.5 in sentiment.

We now have the reviewer's name and possibly their email address (PII) stored in our Snowflake database. Along with this, we have a categorization of the review based on the LLM's response, a sentiment score, and an indicator of the frequency of this type of review derived from our historical review vector database.
At this point, the outputs can include a dashboard that displays the review categories and provides a time-based analysis to determine if a particular product or service is receiving more or fewer reviews over time. When combined with financial data, this can reveal significant trends and insights that were previously unattainable on a large scale.
Another possible output is a curated response to the reviewer, assuming you have the legal right to store and use the reviewer's PII for this purpose.
I’ve developed a Matillion pipeline that takes the reviewer's name, email address, and the review itself and integrates an LLM into the pipeline for a second time. Using Cortex ensures that PII data, such as names and email addresses, are processed securely within Snowflake's ecosystem.
In these Cortex integrations, Matillion allows me to use models like Llama, Mistral, and Snowflake's own Arctic, among others. Here, I'm using Llama, and you can see an example of the system and user prompt below:

The results are fantastic. The LLM not only responds to each element of the review but also uses PII data to make it highly personalized.
Once I’ve exported this data to a table or appended it to an existing one, I can use the Email Sending components in Matillion to send mass responses to customers who have taken the time to leave reviews.

At this stage, one final review by our customer success team can ensure quality. This team no longer needs to spend all day categorizing reviews and writing responses; instead, they can focus on high-value customer success tasks that help drive your business toward its goals.
This streamlined process uses Matillion, Snowflake, and LLM integrations to manage this task efficiently, safeguarding your brand reputation—all without a single line of code!
Joe Herbet
Enterprise Sales Engineer
Featured Resources
The Future of Data Belongs to the Bold: Why Being a Challenger Matters When Choosing a Data and AI Partner
Matillion has been named a Challenger in the 2025 Gartner® Magic Quadrant™ for Data Integration Tools – recognition that we ...
Data SheetsReady to lead your team into an AI-first future?
95% of generative AI pilots at companies are failing, according to ...
BlogMatillion + Snowflake Intelligence: Fueling the Agent-to-Agent Era with Autonomous Data Supply
As an official Snowflake Intelligence Launch Partner, we’re enabling a new era of autonomous, high-capacity data supply – ...
Share: