I’m often asked by prospective customers, “What job monitoring capabilities does Matillion ETL have?”
Matillion, of course, logs all server events which users can download from the admin menu. Additionally, there is the ‘Task History’ feature which is accessible in the bottom right pane or via the project menu.
The Task History provides details of every single component used. This includes job start and end times, any error messages generated and how many records were processed. This level of detail combined with the ability to filter this screen for only failed sub-jobs is extremely useful for monitoring and troubleshooting your Matillion ETL environment.
Although this level of detail is useful in a lot of situations, sometimes customers want a more granular view. They want an aggregated summary to answer questions such as:
- How many jobs failed in the last 24 hours?
- Is the runtime of my jobs increasing or decreasing over time?
- Is the amount of records being processed increasing or decreasing over time?
- What is the average time it takes to run a particular job?
To answer these questions you can very quickly and easily export Task History to your target data warehouse. This can be done using the Matillion API to provide a summary level and detailed level of events. The quickest way to do this would be to create a Matillion job to export Task History to your data warehouse.
Building a Job to handle both initial load and incremental loads
I am using a job design pattern that will handle both initial load and incremental load scenarios.
The first step in this job is to use the Python Script component. I use this to get the high watermark value (the latest load date) from my task history table. If this is an incremental load then I will then use this date as a filter on my extract. However, if this query fails because it’s the first time the job has been run and there is no data available yet, I branch off to create the table if it doesn’t exist and use the default load date instead of the high watermark.
Two API Query components are then used to extract the summary and detailed data and stage this in Amazon Redshift (Snowflake & Google BigQuery products would use the same components).
Transform the raw API data
As previously mentioned the level of detail provided by Task History detail API is at the component level. I want to aggregate this up to job level to visualize with Amazon Quicksight. I also want to filter out any validation events and only keep the execution events so I have components to handle these transformations.
Then I can join the aggregated detail to the summary data which provides job level information such as the job name, project, version etc.
After I have flattened out this data set I can convert the dates to a format that will be useful in BI layer and calculate the job run-time based off of the start and end dates.
The final step is to date-stamp each row with the load and append the data to my target table which I will use as the Amazon Quicksight data source.
Visualizing the Data
After connecting Amazon Quicksight to Amazon Redshift and selecting my task history table I can begin to answer the questions I asked at the start of this article:
How many jobs failed in the last 24 hours?
Here I will create a summary table with the job state included and filtered to show the last 24 hours. I could also visualize the state in a simple pie chart.
Is the run time of my jobs increasing or decreasing over time?
To answer this question I have shown the average job time (seconds) by start date in a line graph.
Is the amount of records being processed increasing or decreasing over time?
This chart will show the maximum row count of a particular job per day on a bar chart.
What is the average time it takes to run a particular job?
Again using a bar chart I can compare the average job time of different jobs.
These graphs make up my ETL monitoring dashboard.
You can, of course, choose to visualize the data in your own way. However, we hope this serves as a useful starting point for how you can build your own ETL job monitoring dashboard using the Matillion Task History. To help you get started, download the Sample Job I used in this example.
[callout-content title=”Want to try out Matillion ETL for Amazon Redshift with Quicksight? You can take a free 72-hr Test Drive or start a 14-day Free Trial.”]
[button title=”Take 72-hr Test Drive” link=”https://info.matillion.com/etl-redshift-test-drive”]
[button title=”Start 14-day Free Trial” link=”https://matillion.com/etl-for-redshift/free-trial/” style=”ghost”]