Building a dynamic and reusable Matillion transformation with Grid Variables

Matillion’s Data Productivity Cloud is a productivity platform for data engineers. In this blog, I’ll show how to build a simple transformation and extend it for further use cases with Matillion Grid Variables

Grid variables are 2-D Arrays similar to database tables on a smaller scale. They can store information like table schemas, key-value pairs, or even a small table of data.

In this blog, you’ll learn how to use grid variables to make a somewhat complex data transformation dynamic and reusable.

Use case

Consider a source dataset replicated from an on-prem database to Snowflake which includes a column representing a list of IDs, comma delimited. The number of IDs in the column is variable and, for example, could include 1, 2, 3, or more IDs within a single cell. In Snowflake, I would like to have each row of my dataset to be associated with a single ID. For this example, a small sample dataset can be created with a Fixed Flow Transformation Component. Setting the stage for a larger example, consider a case where there are many tables with this structure of multiple IDs in a single cell. We’ll eventually want to build a transformation that can be reused whenever I have a table that needs this split.

Splitting IDs for a Single Table

When building a reusable data pipeline in Matillion, start with a base case for a single table. 

We need to:

1. Split the field into multiple columns

2. Transpose the columns to rows

3. Clean up and normalize our data for our final data model.

When configuring the transpose, you need to define exactly the number of transposed columns. This works for our simple example but would break if more than 4 ids were in a single cell.

Leverage Grid Variables to make the ID splitting dynamic

We need to add some dynamic functionality to our transformation to account for a variable number of ids stored in a single cell. It’s also important to be able to use this function with more than this single table. As such, everything will be parameterized.

1.Use Query Result to Scalar and Query Result to Grid components to instantiate the Scalar and Grid Variables that will be used in our split_rows Transformation. 

2. Pass all variables to the split_rows_parameterized Transformation

3. Define our transformation logic with variable input

Starting with our table input, all of the references to columns in the transformation logic will now be defined by grid variables.

Next, the Split Field

.. and the Transpose

Continuing with the Filter

. the Rename, which can drop columns

Last the Rewrite table

As a final step, I can add a layer of orchestration that allows me to pass in a table name, as well as the name of the column with the list of IDs that need to be parsed into their own rows.

The next time you build a Transformation in the Data Productivity Cloud, consider if the logic will be needed again later. If so, leverage Grid Variables to build a dynamic and reusable transformation!

Not using Matillion yet? Try a free trial today!

Additional resources: 

Using Grid Variables to customize Matillion’s task history

Fixed flow in focus: Simplifying data pipelines with static inputs

Angus Kinsey
Angus Kinsey

Enterprise Solutions Engineer

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.