A closer look at improved error handling in Matillion ETL version 1.51
It’s great when things go right when you’re loading and transforming your data. But it’s equally important, even more so, to know when things don’t go so well.
Matillion ETL already has a number of ways to help make jobs more resilient to errors including explicit error handling, logging and notifications. However, as cloud infrastructures become more complex–often extending across multiple platforms–so do the ETL jobs that support them. Feedback from our customers confirms this new reality. Many of them say that failures are now more problematic to diagnose because:
- Errors may be several layers deep within a job/subjob hierarchy
- There may be more than one error triggering a failure
Error handling for complex projects
Matillion ETL version 1.51 introduces new and improved error handling functionality that enables you to:
- Configure generic error handling rules for an entire project, removing the need to explicitly configure error handling per component (unless specifically desired).
- Pass information from all levels of a job to a single consolidated error message, which is then posted to a webhook of your choice. This means an error message several layers deep in a next job can be located from within a single message, and traced back to the source of the problem.
Let’s look at how these new features work.
Adding error reporting to a project
First, we set up the connection to our webhook from the project menu in Matillion ETL using the new Manage Error reporting option:
From here, we can configure the response to the webhook in the payload (i.e., the message sent to the webhook):
As part of the error handling we can also configure:
- The interval and a maximum number of posts per interval, which limits the number of errors sent to the webhook. Setting the maximum to 0 means there is no limit,
- The payload , which can include a number of new variables such as the details of the errors encountered.
The image below shows an example job where components throughout the job and subjobs are set to fail (failures include raising an exception in a python component, and unconfigured table components):
The task view of the job shows the various failures at different levels of the job:
Based on the configuration of the error reporting payload above, the following error would then be sent to the webhook:
What can this be used for in practice?
The error handling functionality was discussed with users at a focus group and the consensus was pushing to a webhook provides the most flexibility for all. This allows users to pick up their notifications where required and means Matillion can integrate into other software monitoring tools such as Datadog.
An example would be to post a message in Slack by configuring the payload in Matillion ETL according to the syntax required by the Slack webhook as in the example below:
Errors would then trigger a message to be posted to Slack, as shown in the screenshot below:
It is worth noting that this feature is intended to surface unhandled errors: that is, those that result in the failure of a job run. Any handled errors – those in which the ‘failure’ output is connected to another component but otherwise do not cause the run to fail – would not show up in the error handling payload unless they were also connected to an ‘end failure’ component.
Try our new error handling capabilities for yourself
While we would still recommend explicit error handling in the following cases:
- When a failure or might be reasonably common–for example, a retry iterator for a connection which might not be 100% reliable
- When a specific action needs to happen, as we previously discussed in our blog series on error handling
But the ability to capture all errors across a project and post the result to a webhook will reduce the effort in identifying and resolving any unhandled or unexpected failures.
Want to check out all of the latest features in Matillion ETL v1.51?
If you have Matillion ETL and want to upgrade to the latest version, check out our blog on best practices for updating your Matillion ETL instance.
If you want to see what Matillion ETL can do for your business, request a demo.
The post A closer look at improved error handling in Matillion ETL version 1.51 appeared first on Matillion.
10 Best Practices for Maintaining Data Pipelines
Mastering Data Pipeline Maintenance: A Comprehensive GuideBeyond ...News
Matillion Adds AI Power to Pipelines with Amazon Bedrock
Data Productivity Cloud adds Amazon Bedrock to no-code generative ...Blog
Data Mesh vs. Data Fabric: Which Approach Is Right for Your Organization? Part 3
In our recent exploration, we've thoroughly analyzed two key ...