Leverage Parameterization in Matillion for reusable data pipelines

In this blog, you’ll learn how to leverage parameterization in Matillion to reuse existing pipelines as your use cases evolve.

At Matillion, I frequently work with data teams that are either replicating and analyzing data for their customers or otherwise replicating data from multiple instances of the same type of source system (say, Postgres) with multiple sets of credentials. In this blog, we’ll explore a brief example of how to make a complex reporting pipeline modular for reuse.

Setting the stage

Greenwave Technologies is a technology company that sells products for sustainable agriculture. Greg is a data engineer at Greenwave. Greg has just finished building a data pipeline in the Matillion Designer that generates a profit and revenue analysis dataset for a branch of Greenwave’s sales org. Now, Greg from Greenwave has been tasked with replicating that same dataset for all regional branches of Greenwave’s sales org.

Base case

Here’s the pipeline. Data is loaded from Salesforce, Postgres, and a JSON file in S3. Then a series of transformations are run to cleanse, flatten, and aggregate the data.

Making it modular

Not wanting to rebuild and then maintain the same report for every branch of Greenwave, Greg decides to make his pipeline reusable. Here are the facts

  • Every branch needs the exact same analysis done
  • There is one Salesforce account that governs all information about Greenwave customers (convenient!)
  • There’s only one JSON file that Greg needs to use to map the Postgres Point of Sale/Order data to the Salesforce Accounts data (convenient!)
  • Each branch of Greenwave has their own postgres instance (aha!)

Greg decides to wrap his initial report in an orchestration that will iterate over the credentials for each branch’s bespoke Postgres instance.

The iterator will iterate over some metadata stored in a Snowflake table and map the values to a series of variables.

The Snowflake table has columns representing the following:

  • Branch Name
  • Postgres Connection URL
  • Postgres Username
  • Postgres Password (values in this table aren’t the actual password. They are references to a named Secret Definition in Matillion)

There are only a few changes that need to be made to the pipeline at this point. Greg adds all of his variables to the Database Query component,

commits his changes to GitHub, and calls it a day!

In this blog, you learned how Matillion makes it easy to adapt existing data pipelines to handle more data, and specifically more sources of data, as use cases evolve.

Not using Matillion yet? Check out a 2-week free trial (Snowflake included) today!

Angus Kinsey
Angus Kinsey

Enterprise Solutions Engineer

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.