Visit Matillion AI Playground at Snowflake Data Cloud Summit 24

Find out more

The Business Value Artificial Intelligence Adds to Data Pipelines

As Artificial Intelligence (AI) continues to grow into a more and more powerful technology, business and technical leaders find themselves asking how to apply these new capabilities. As the potential for greater productivity expands daily, end users still struggle to find ways to apply this newfound power in their day-to-day tasks. 

Here at Matillion, we’ve been hard at work adding AI into both our platform as well as the data pipelines our customers build. These pipelines orchestrate 1) the movement of data into Cloud Data Platforms like Snowflake and Databricks as well as 2) the transformation of that data into business-ready insights. 

Given Matillion’s new AI capabilities, the question remains how best to add AI into these analytic processes. And in many cases, the answer to adding business value lies in augmenting existing data movement and transformation processes. The following discusses the value AI adds to the data integration process.

Common Data Pipelines

First, consider a common use case as depicted below. Here we see the various data silos belonging to a hypothetical company called GreenWave Technologies, a sustainable agriculture tech company specializing in innovative farming solutions. The data sources they’re interested in include:

  • CRM data in Salesforce,
  • Customer 360 data in a file extracted from its source system, as well as
  • Point-of-Sale data stored in a PostgreSQL database.

Pipeline to integrate data silos

This pipeline extracts data from those sources, loads the results into Snowflake, and then runs the transformations necessary to make that data business-ready. At the highest level, these transformations cleanse, unify, and aggregate the extracted data, preparing it for visualization and giving GreenWave employees, from analysts to executives, the ability to garner insights from the results. In this case, information from each data source is needed to generate executive dashboards summarizing basic sales performance metrics like customer or product profitability.

Dashboards based on transformed and integrated data

Predictive Analytics with Machine Learning

So far, so good. We’ve used a backward-looking data integration process to understand how GreenWave’s business is doing. Yet the descriptive statistics discussed above don’t provide a predictive analysis. We can do better by incorporating Machine Learning (ML). Below we see a company-wide measurement of sales performance in the form of Total Monthly Profit. And yet, with an incomplete month represented in the last data point, GreenWave as a company may be left wondering where the month will end up. Enter the forecast generated by machine learning algorithms, which executes as part of our data pipeline. Matillion can do this in a number of different ways, namely using our native Python execution capabilities or by leveraging Snowflake’s Cortex ML functions.

Using either method, these ML capabilities predict where this and future reporting periods will end up.  The dashed line below represents the prediction which is created and refreshed as the last step in our data pipeline’s execution. The value of executing the machine learning step as part of the overall data pipeline is to ensure the prediction algorithm is working with the latest data. This gives the business the earliest possible warning of a problematic sales outcome and provides as much time as possible to intervene.

Profit chart enhanced by ML predictions

Sentiment Analysis in Data Pipelines

Machine learning has been around for a number of years so let’s turn the discussion now to the application of the latest in generative AI (GenAI) technology: the Large Language Model (LLM). Below we’ll see how LLMs can be incorporated into a data pipeline. In fact, those pipelines are still necessary to prepare the data we want to use in our LLM use case. The Sales Performance data discussed above describes GreenWave’s previous performance and forecasts future performance. The pipeline below integrates a sentiment analysis with those insights.

Data Pipeline Incorporating Sentiment Analysis

Here we have an Excel file filled with roughly one thousand customer reviews of GreenWave’s products.  These reviews run the gamut from very negative to very positive. For example, consider the two (fabricated) reviews below:

Positive Review Example:

I absolutely love using GreenWave Technology's Voya Sildax Crop Rotation Planning Software! It has revolutionized how I approach my farming practices, allowing me to improve soil health, prevent diseases, and optimize crop yields effortlessly. This software is an invaluable tool for any farmer looking to implement sustainable and efficient crop rotation schedules.

Negative Review Example:

This Move-Lab Precision Farming Drone from GreenWave Technology was an absolute disappointment. Despite its promises of advanced imaging and mapping technology, the drone failed to accurately analyze crop health or identify pest infestations. The precision touted in the product description was nowhere to be found as the data and field conditions provided were extremely inaccurate. Overall, a waste of time and money for any farmer looking to improve their crop yields.

The above is an example of unstructured textual data containing actionable but untapped business value.  Matillion’s OpenAI Prompt component provides the key to unlocking this value. First, by loading these reviews into Snowflake, the data pipeline can then be pointed at those results and OpenAI can categorize the sentiment in these reviews. Here’s the prompt used to gather these insights:

You're a marketing analyst reviewing user product reviews from GreenWave Technology, a leading sustainable agriculture tech company that specializes in developing innovative solutions to promote sustainable farming practices. Based on the sentiment of the Product Review, characterize the Product Review as "Negative", "Mixed", or "Positive".

With those sentiments identified and loaded into Snowflake, we can then apply data transformations to associate the frequency of negative opinions with the Sales performance data we’ve already generated.  Here’s an example of the insights this might produce:

Merging insights obtained from sentiment analysis

The visualization above identifies an outlier account with a substantially higher amount of negative sentiment. Further, we can see that this negative sentiment is clustered within GreenWave’s third largest and most important customer, Express Logistics and Transport. This gives GreenWave the ability to target its Customer Success activities where they will move the needle the most, namely by proactively aligning product experts with Express customers.

So by leveraging sentiment analysis within our data pipelines and by integrating Artificial Intelligence with Business Intelligence, GreenWave generated actionable insights with tangible business impact. And they did so more quickly, affordably, and reliably than any manual process would allow.

Limitless Possibilities

Forecasting through and sentiment analysis like the examples above are only the beginning. Summarization, classification, and scoring use cases abound, among many others. Integrating machine learning and generative AI into data pipelines offers businesses unprecedented opportunities to enhance their analytics processes and drive value. 

By incorporating machine learning algorithms, such as predictive forecasting, and leveraging LLMs for sentiment analysis, companies can uncover actionable insights from previously untapped sources of data. The case of GreenWave Technologies illustrates how AI augments existing data integration processes, enabling businesses to not only understand past and present sales performance but also anticipate future trends and include customer sentiments within their business strategies.

If you’d like to see more Matillion’s AI capabilities in action and how they can be incorporated in your business, sign up for a demo.  

David Lipowitz
David Lipowitz

Senior Director, Sales Engineering