How to use Matillion's Database Query Component to ingest Microsoft SQL Server data in Snowflake

Problem statement:

While many organizations have started or made their journey into the cloud, there still remains a need to operate and run traditional RDBMS environments, like MS SQL Server, on-prem. Legacy applications may interact and operate from these instances, yet their data continues to be siloed from the broader reporting elements that are now running off of data clouds such as Snowflake.

This siloed data can lead to inaccurate, slow, or inconsistent reporting which may have an impact on the broader business. Data teams are tasked with bringing data over to Snowflake. However, many of the traditional tools available to them do not offer the speed, ease of use, and reliability that the Matillion Data Productivity Cloud (DPC) offers.

This article provides a quick walk-through of what DPC offers for users who use MS SQL server data and Snowflake! This is a three-part series where we will cover the following:

Part 1 - Ingesting SQL server data using Database Query component (basic and advanced)

Part 2 - Reverse ETL to write data from Snowflake back to MS SQL (coming soon)

Part 3 - Matillion’s CDC offering for MSSQL (coming soon)

Configuring the Database query component for MSSQL

Pulling a single table from MSSQL to Snowflake

In an orchestration pipeline, we will use the database query component to connect to our SQL server database and pull data into Snowflake. This component offers multiple drivers for SQL servers as well as drivers for other RDBMS like Oracle, PostgreSQL, and MySQL. 

In this configuration, we will use our basic mode to have Matillion guide you in a drop-down fashion to source the data you want from MSSQL over to Snowflake. The prompts populate lists of values (from MSSQL [source] and Snowflake [target]) to reduce user error and ensure successful pipeline runs. This also allows self-service by users who may not possess deep technical knowledge to build effective pipelines with minimal to no support. For this example, I will extract the CUSTOMERS table from MSSQL and load it into Snowflake.

One of the best features about Matillion is that you do not have to create the target table in advance of loading data. Instead, we simply tell Matillion where we want to put this sourced data in Snowflake. This is done in the destination tab of the database query component properties. Matillion will infer the appropriate schema on the Snowflake side from the metadata coming from MSSQL, and simply create the table on the fly in Snowflake then load it all in one motion. Talk about productivity!

The component is now ready to run. When complete, a new table will be created by Matillion named stg_mssql_customers  in Snowflake. 

Pulling multiple tables using dynamic SQL and iterators

While there may be use cases where you want to sync one-off tables from MSSQL to Snowflake as shown above, it is more likely that you will want to sync dozens if not hundreds of tables over. Doing that in a one-by-one basis would not be an effective development approach, so using a combination of variables, iterators, and dynamic SQL is a better approach and more modular in design.

Matillion supports many variable types, one of which is a pipeline variable which is essentially a scalar value that is local to a pipeline. In this example, I will create 2 variables (jv_table_name and jv_table_schema) to store table names and schema values respectively.

I will also re-use the database query component in my pipeline but instead of using the basic mode, I will use the advanced mode which will allow me to write a SQL query against the source MSSQL database. This is also where I will reference my variables to make my SQL statement dynamic.

Note the ${variable_name} syntax to reference the variables I created earlier. Also, notice how our target table name is referencing the ${jv_table_name} variable in our destination tab. This is important because each table has different metadata, so Matillion will create new tables and populate the variable value at run time.

Finally, in order for this to work, we need to attach an iterator on top of our database query component to set the loop that Matillion should go through. Matillion offers many different types of iterators for your loop-based pipelines. For this example, we will use the fixed iterator which is a fixed list of values to loop over for simplicity in this demo.

By attaching the fixed iterator to our database query component, we can configure it to assign values to our variables at run time. We have set four combinations of schema + table name that will ultimately create 4 new tables in Snowflake at run time with the appropriate data coming from MSSQL.

And that’s it! Watch the video below to see a recorded demo of this taking place. 

 

 

Imagine doing this on a larger scale. For example, you can configure Matillion to pull data from MSSQL information_schema to pull a list of tables from the schemas you want to copy over. Then you can systematically pass that information_schema data into variable(s) in Matillion to loop over a job such as this and clone an entire database from MSSQL to Snowflake in a few hours. This is a common use case that we have solved hundreds of times across many customers. Give Matillion a spin by signing up for a free trial to see how quickly you can solve this use case in your environment. Start a trial today!

Konrad Bafia
Konrad Bafia

Manager, Sales Engineering

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.