Reverse ETL data from Snowflake back to SQL Server using Matillion’s Output Components

This is part two of a three-part series about Matillion’s capabilities with Snowflake and Microsoft SQL Server. Please refer to Part 1 of this series, where we covered ingesting data from MSSQL into Snowflake. Data must first exist in Snowflake before you can push it to MSSQL—a process known as Reverse ETL.

What is Reverse ETL?

We all know that making changes in an organization is difficult. It is even more difficult when changes are required to technology that is dependent on/for other technology or business applications. This is typically the case with transactional RDBMS instances like Microsoft SQL Server.

While many have made the leap to the cloud and adopted modern data platforms such as Snowflake, the scope of that change often doesn’t include moving off of RDBMS like MSSQL because often these are the data sources for legacy applications that are not suited or compatible with the cloud. These applications typically have been around for years if not decades, and ripping out their core database to move to the cloud is a huge if not impossible, task for many data teams. These on-prem RDBMS aren’t elastic like their cloud counterparts, yet customers and internal stakeholders alike are demanding data requirements and enhancements. So, how can we help?

Well, one way we can help is by relieving the MSSQL instance from any analytical or computational workloads and having it solely focus on transactional events from the underlying data application or consumer. We can follow the steps we did in part 1 of our series and replicate the required transactional data into Snowflake from MSSQL, then use the power and elasticity of the data cloud to transform/aggregate/calculate the data at scale, and then simply send inserts/updates back to MSSQL using Matillion’s output components.

This activity is commonly referred to as “reverse ETL” in the data engineering space, and it is a trivial task to achieve in Matillion’s Data Productivity Cloud!

What is in Scope

Before we jump into our walk-through, it is important to set the scope of what Matillion offers in regard to MSSQL.

In scope:

  • Pulling data from MSSQL into cloud data platforms like Snowflake
  • Pushing data to MSSQL from cloud data platforms like Snowflake

Not in scope:

  • Creating transformation jobs in Matillion to be executed in MSSQL
  • Transforming data in-flight between MSSQL and cloud data platforms like Snowflake
  • Executing custom scripts on MSSQL
  • Pushing ‘ad-hoc’ data on the fly without it first existing in a Snowflake table

Configuring the Microsoft SQL Server Output component

Pushing a single table from Snowflake to MSSQL

In an orchestration pipeline, we will start with the Microsoft SQL Server Output component to send data from a Snowflake table to our SQL server database.

For this component, the first set of properties pertains to your MSSQL instance that you want to connect to. You can read more about these fields in our documentation here: https://docs.matillion.com/data-productivity-cloud/designer/docs/sql-server-output/

Once you have entered the required MSSQL information, the next set of properties pertains to what data you want to send from Snowflake to MSSQL. 

In this scenario, our legacy application captures customer reviews in MSSQL from our legacy application. However, doing sentiment analysis on these reviews is difficult and compute-intensive against our on-prem MSSQL instance. So we have replicated the review data into Snowflake and used CortexML sentiment to calculate a negativity rate for each of our product suppliers into a table called item_sentiment_cortex_summary. Now we want to send this analysis data back to MSSQL so our application engineers can join to it within our legacy app to provide a new feature to our application customers.

Let's select our item_sentiment_cortex_summary table in Snowflake in our Source Table property. We want the table in MSSQL to have a dpc_ prefix, so we will set the value in our Target Table property to dpc_item_sentiment_cortex_summary. We will also choose all of our columns in the Load Columns property.

For Table Maintenance, we will choose Create If Not Exists. This will create the table on the fly in the schema we specified, which is a nice feature since we don’t have to worry about creating the target table in MSSQL. The last property we will set is Truncate Target Table to True because we want to truncate/insert this table every time it runs. You can also up the batch size if you are moving a lot of data but we will keep it at 5000 for now. Our component should be in a valid state after all of the properties are set.

As proof, here is our MSSQL database showing that this table does not exist before our pipeline has run in Matillion:

Now let’s run our newly created MSSQL Output job. You can see that the pipeline ran successfully and we should have a newly created table in our dbo schema with 11 records loaded from Snowflake.

And that’s it! With a few properties set on our Output components, you can send critical data back to Microsoft SQL Server instances that enhance your legacy applications while still leveraging the power of Snowflake and Matillion to do the heavy lifting and allow a more stable transactional database for your apps. 

Learn more about how SQL plays a central role in reverse ETL pipelines. 

Stay tuned for the final part of our three-part series, where we talk about how Matillion interacts with MSSQL’s CDC offerings. Until then, feel free to reach out if you have any questions or comments or just want to say hi 👋

Check out part one here

Konrad Bafia
Konrad Bafia

Manager, Sales Engineering

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.