Nowadays, ‘Big Data’ is what defines competitive advantage for a business. Understanding the way your data flows is key in harnessing its power to make more consistent and confident decisions. To help you achieve this, Matillion offers the Data Lineage enterprise feature in all of our ETL products.
With this feature, you can track columns across the transformation process. This helps you understand the effects components have had on that data.
How to view Data Lineage
When selecting a component within a Transformation Job, a ‘Lineage’ tab comes up. This tab allows you to inspect the lineage for any fields that exist in that component up to that point. You can view your Data Lineage using a graphical representation or a SQL view which highlights the selected column.
When you generate the lineage and expand the result, you can see the column’s origin and any transformations that have occurred to it throughout the process.
An example Transformation Job
Let’s look at an example. In the sample Transformation Job below we applied some simple transformations to a public civil aviation dataset. This Transformation Job joins flight pattern data with airplane data, adding a ‘MANUFACTURER’ from the airplane data to the flight pattern data. The job is segmented into two flows: one for ‘good’ data and one for ‘bad’ data.
Trace Your Data’s Journey
We can use the Data Lineage feature in Matillion to better understand the journey the ‘MANUFACTURER’ column has taken throughout this Transformation Job.
First, we follow the top flow, which deals with our ‘good’ data. We can see the simple path the ‘MANUFACTURER’ field took to reach the flight’s good EXT table.
When we generate the lineage and expand the result, we can see that the ‘MANUFACTURER’ data originated from the raw plane info table and had no transformations applied to it along the way.
We can also verify that by looking at the SQL.
Next, let’s follow the bottom flow, which deals with our ‘bad’ data. After we generate the lineage, we see a similar journey occurred to the ‘MANUFACTURER’ column as in the good flow. However, we now see that the Calculator component changed the ‘MANUFACTURER’ value to UNKNOWN.
By viewing the SQL, we can better understand exactly what happened to this column during this flow.
See the Matillion Data Lineage feature in action
With this simple example, you can see how Data Lineage lets you understand the transformations being applied to your data. This visibility builds confidence in your data. Also, Data Lineage can improve operational efficiency and ensure that you are in compliance with data security requirements and other regulations.
Watch Matillion Principal Architect Arawan Gajajiva walk you through this example:
Know the flow of your data with a click of a button. To get started with an enterprise version of Matillion, contact your Account Executive or reach out to email@example.com.