Blog

Matillion ETL Supports Data Lineage in Enterprise Versions

Nowadays, ‘Big Data’ is what defines competitive advantage for a business. Understanding the way your data flows is key in harnessing its power to make more consistent and confident decisions. To help you achieve this, Matillion offers the Data Lineage enterprise feature in all of our ETL products.

 

With this feature, you can track columns across the transformation process. This helps you understand the effects components have had on that data.

How to view Data Lineage

When selecting a component within a Transformation Job, a ‘Lineage’ tab comes up. This tab allows you to inspect the lineage for any fields that exist in that component up to that point. You can view your Data Lineage using a graphical representation or a SQL view which highlights the selected column.

 

When you generate the lineage and expand the result, you can see the column’s origin and any transformations that have occurred to it throughout the process.

 

An example Transformation Job

Let’s look at an example. In the sample Transformation Job below we applied some simple transformations to a public civil aviation dataset. This Transformation Job joins flight pattern data with airplane data, adding a ‘MANUFACTURER’ from the airplane data to the flight pattern data. The job is segmented into two flows: one for ‘good’ data and one for ‘bad’ data.

 

Data Lineage in Matillion: Transformation Job screen

 

Trace Your Data’s Journey

 

We can use the Data Lineage feature in Matillion to better understand the journey the ‘MANUFACTURER’ column has taken throughout this Transformation Job.

 

First, we follow the top flow, which deals with our ‘good’ data. We can see the simple path the ‘MANUFACTURER’ field took to reach the flight’s good EXT table.

 

When we generate the lineage and expand the result, we can see that the ‘MANUFACTURER’ data originated from the raw plane info table and had no transformations applied to it along the way.

 

Data Lineage: Get Lineage screen

 

We can also verify that by looking at the SQL.

 

Get Lineage SQL screen

 

Next, let’s follow the bottom flow, which deals with our ‘bad’ data. After we generate the lineage, we see a similar journey occurred to the ‘MANUFACTURER’ column as in the good flow. However, we now see that the Calculator component changed the ‘MANUFACTURER’ value to UNKNOWN.

 

Constants Unknown screen

 

By viewing the SQL, we can better understand exactly what happened to this column during this flow.

 

constants unknown sql screen

 

See the Matillion Data Lineage feature in action

 

With this simple example, you can see how Data Lineage lets you understand the transformations being applied to your data. This visibility builds confidence in your data. Also, Data Lineage can improve operational efficiency and ensure that you are in compliance with data security requirements and other regulations.

 

Watch Matillion Principal Architect Arawan Gajajiva walk you through this example:

 

 

Know the flow of your data with a click of a button. To get started with an enterprise version of Matillion, contact your Account Executive or reach out to sales@matillion.com.