Visit Matillion AI Playground at Snowflake Data Cloud Summit 24

Find out more

Matillion ETL Error Handling Patterns

Matillion Images ETL Product Screenshots Components Flow

Matillion is a low-code / no-code platform for data transformation and integration. As part of that, Matillion ETL enables you to build easily maintainable data processing workflows that help access the full potential value of data.

Part of making data processing workflows resilient is handling runtime errors in a robust way. Many potential problems are caught during compilation – known as validation. But runtime errors can always still occur. It is important to be ready for them so that problems can be diagnosed and corrected quickly, and monitored over time.

This guide describes some common error handling patterns that you can use with Matillion ETL.

 

Prerequisites

The prerequisites for implementing error handling patterns are:

Optionally, you may wish to create your own customized shared jobs to add into your error handling.

 

Default error handling in Matillion ETL

Matillion ETL enables you to create arbitrarily sophisticated data processing pipelines, using layers of Run Orchestration and Run Transformation components.

By default, unhandled errors simply cascade up to the caller. When a parent job launches a child job and the child job hits an error, the parent job will hit the same error.

A good pattern is to have just one Run Orchestration in the top-level job. This allows you to split the design into two logical parts:

  • Top level job – focus on scheduling, logging and error handling
  • Lower level jobs – focus on data transformation and integration

Use just one Run Orchestration in the top level Job

When an error occurs within a job designed this way, the default error handling will result in something like below in the Tasks panel:

The out-of-the-box error handling works well. A failure bubbles right back up to the top level and causes the whole job to fail, regardless how deeply nested it was.

On a failure branch, an automatic environment variable named ${detailed_error} is made available. It contains the full error message – including the stack of job names, the name of the component that failed and the error message itself.

The detailed_error variable is useful but the text is usually long – sometimes containing multiple lines – requiring effort to parse and interpret. To capture just the error text, it is necessary to pass messages upwards in the call stack.

 

Passing messages upwards in the call stack

Data processing jobs usually don’t require many input variables. Their purpose is to deal with whatever data they find in the cloud data warehouse or lakehouse. However, whenever you do need to provide parameters down into another Matillion ETL job, you can do so using the Set Scalar Variables and/or Set Grid Variables properties.

To pass information in the reverse direction – from a called job back up to the caller – use the Export tab.

Use the Export Tab to receive parameters upwards from a called job

To use the Export tab, open a Matillion orchestration job, and select a component. The Export tab will become visible.

Open the mapping by pressing the Edit button. In the dialog that appears, you can map “source” values to target variables:

  • Source – a set of built-in values. When exporting from a called sub job, the list additionally includes all the public job variables owned by the called job
  • Target variable – can be any Matillion job variable (preferred) or environment variable

Job variables are usually a better choice. Environment variables are shared among all jobs and therefore come with a higher chance of accidental usage overlap.

Use Job Variables in preference when communicating with called jobs

To capture the error coming from a component whenever it fails, select that component and map the Message source to a chosen job variable. In the example below, the job variable is named p_error_message.

This allows you to capture error messages from individual components.

The Message export is specific to individual components

To pass a value upwards in the call stack, use the Export tab of the Run Orchestration component in the parent job. Map the child job’s variable to the parent job’s variable.

In this way, you can pass a value up any number of levels in the call stack.

 

Using the Retry pattern

Matillion operates within the context of a broad set of interconnected data layers and processes known as the Data Fabric.

In this type of infrastructure, a significant category of the errors that can occur are random, unpredictable and transient. Typical examples are temporary network unavailability or overload.

Within a Matillion ETL orchestration job, the Retry Component is a great way to proactively deal with temporary errors like these. The strategy is simply to re-run the failing task a fixed number of times, and optionally with an amount of backoff.

You can use a Retry to wrap any orchestration job component:

  • Wrapping a Run Orchestration (as shown below) – the entire sub job will be re-run
  • Wrapping another component – that individual component will be re-run

Once wrapped inside a Retry, the red failure path will only be followed if every single one of the retries failed. The retry loop will of course end immediately after a successful execution.

A Retry starts counting from the second attempt

 

Next Steps

You may find these further Matillion DataOps articles useful: