Crafting a hyper-personalized marketing campaign: Integrating multiple datasets for a live recommendation engine

In part one of this blog series, we discussed how to connect to source data, use CDC within Matillion, and integrate with any REST API to load all necessary data into our target cloud data warehouse. This setup is crucial for creating our hyper-personalized marketing campaign. 

Now, it's time to develop our marketing campaigns using Matillion's transformation capabilities and Large Language Model (LLM) integrations.

Let's begin by organizing the data properly. First, we need to determine the weather conditions in our customers' locations. For instance, “Fairly Sunny” in Manchester might sound dubious, but it's crucial to know which products sell best under these weather conditions globally.

To do this, we use the aggregate function to compile this data. Here's a sample of the data:

With that, I’ll understand how much of a particular product line has been sold across my estate under specific weather patterns, but to create the most effective hyper-personalized outreach, I need to determine which product to promote. 

Let's leverage classic AI, Data Science, Machine Learning, and some basic mathematics, such as Standard Deviation. 

For those who may not remember their school math, no need to worry—Matillion contains all the useful aggregations that Snowflake can perform within its aggregate component. This allows me to easily identify the ideal product to promote when grouped by product, weather type, and any level of detail I choose.

Once I know which product to push, I'll filter down to just that product. However, I need to stick to my bespoke pricing model, which accounts for the individual prices each customer pays for our products. To do this, I will group the account and product together to determine the price each customer pays—because who doesn't love finely-tuned pricing models 😉?

With the product to promote identified, the price each customer pays, and the maximum quantity they have previously purchased, it's time to craft our content.

Because this project involves sensitive Personally Identifiable Information (PII) and pricing data, data sovereignty is of utmost importance to the sponsor. Therefore, I will keep the data within the Snowflake ecosystem and use Matillion's Cortex integrations to ensure that the data remains protected and secure in the cloud.

Configuring this LLM integration is quite simple. First, I select a model available in my region—I prefer using Llama, finding it more stable than Mistral. 

With Matillion, it's possible to adjust the temperature settings, and after some experimentation, I set it to 0.3. One of the perks of Matillion is that the sample output appears directly in my browser, so there's no need to run the entire pipeline to view results. This feature lets me see what the LLM generates as I tweak the hyper-parameters in real time.

To work with the prompt, I've found that LLMs typically take a single input, but I have multiple fields I want the LLM to use. By using Construct-Variant, I can effectively pivot or transpose my data, grouping all column names and their associated data into individual variant data types.

This approach is particularly useful for LLM integration because it consolidates metadata (such as column names) within the variant data type itself. This means I don't need to worry about separately providing metadata references—the LLM can readily interpret the data as it has all the necessary context.

Additionally, I use specific system and user prompts to fine-tune the LLM's responses to ensure they accurately fit our needs. For example, I instruct the LLM to write an email without generating a subject line, as I have that part handled separately. Controlling the temperature in this setup ensures the LLM adheres to these instructions reliably.

Now, the VARIANT that I’m feeding in contains the price point, the quantity previously purchased, and the current weather in that location. This helps make the email feel highly personalized to the reader.

You could take this a step further and tailor your language model to reference any datasets available to you similarly, incorporating specific styles in the User Prompt or System Prompt instructions. If you have strong branding guidelines, this is an excellent opportunity to include a high-level summary of them.

Now that I have my content, email address, name, and subject header (based on the customer’s current weather), I can prepare everything for an email. Perhaps I want to send it to my marketing team for review before sending it out. No problem. I can simply write it to a new table and share it with them for final checks before hitting send!

Joe Herbet
Joe Herbet

Enterprise Sales Engineer

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.