Using Grid Variables to customize Matillion’s task history

Using Task History to view details of all the components run in a Matillion pipeline is a great way to monitor the health of your pipeline. 

The Task History tab at the bottom of the UI shows a list of all tasks completed, running, queued, cancelled, or failed. These tasks can be expanded to show subtasks. Each row of the Task History has information on each Component executing in the pipeline. 

Matillion Data Productivity Cloud (DPC) provides a great platform for managing data pipelines, and grid variables are a powerful feature that can enhance your data transformation processes. 

For instance, here's how you can use grid variables to print out customer counts loaded into Snowflake from SQL Server when running a Matillion pipeline. The information on the Pipeline Name, Component, Start Time, Duration, Row Count, and Message is as pictured below:

This is all great information! Recently, I have also had several requests for more details. An example is to have visibility in the Task History as to how many records are loaded per customer—a great use case for Grid Variables!

Understanding Grid Variables

Grid Variables in Matillion are essentially two-dimensional arrays or tables with named columns. They can store multiple values and are particularly useful for handling metadata and passing data between components within a pipeline.

Setting Up a Grid Variable

In your Matillion DPC pipeline, select the Variables tab on the far left of your screen and click Add. Then:

  1. Select the Pipeline variable, and from the drop-down, select Grid. Then, select Next

2. Name the Grid Variable gv_results, leave Visibility as Public, Behavior as Shared, and then select Next

3. We want to track the Customer Name and the Record_Count, so click the + button once to add another row. Name the first Column NAME and the second Column RECORD_CNT, and select Next.

4. For this example, we won’t add default values. So, select Finish

Populating a Grid Variable

There are several ways to store data in a grid variable at runtime. In this case, I'll demonstrate the Query Result to Grid component.

The screenshot below shows a typical setup: executing a query against a table that has just been loaded using the Database Query Component.

1.To Configure the Query Results to Grid set:

Mode = Advanced

Query = select NAME, count(*) AS RECORD_COUNT

from "mssql_store_join"

group by NAME

order by RECORD_COUNT asc 

Grid Variable = gv_results

Grid Variable Mapping =

          NAME = NAME

          RECORD_COUNT = RECORD_COUNT 

The final configuration should look like this: 

Using Grid Variables in a Pipeline

You can use the Print Variables Component to set the Variables to Print to our grid variable, gv_results, and Include variable name to Yes

This will provide output in the Task Details as an array of NAME, RECORD_COUNT. 

For row-based details of customer counts, you can use a Python script within Matillion to iterate over the grid variable and output the data: 

The Python Script is:

source_list = context.getGridVariable('gv_results')
cnt=len(source_list)
#print(cnt)
loop_cnt=0
for x in source_list: 
 print("Company = " + source_list[loop_cnt][0]+" || Record Count = " + source_list[loop_cnt][1])
 if cnt > 1 and loop_cnt < cnt:
    loop_cnt +=1

This method provides much more concise and closely formatted details, as you can see from the following screenshot.

By leveraging grid variables, you can efficiently manage and manipulate data within Matillion DPC, ensuring a smooth flow of information and the ability to customize the output to your pipeline's Task History.

Sign up today to try out the Matillion Data Productivity Cloud yourself!

Alan Goodrich
Alan Goodrich

Enterprise Solution Engineer

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.