ETL Coding: 10 Reasons Why You Should Stop Hand Coding Your ETL Jobs
Companies are still manually hand coding ETL jobs and are losing time and money in the process. Instead, they could redirect resources to data analysis and reporting insights, driving critical business decisions with the help of a data integration tool.
While there are still some valid use cases for hand-coding, we see both small and large companies benefiting from the adoption of an ETL data integration tool. This is because ETL tools reduce or completely negate the need to hand code
Is it time for you to step away from the keyboard and stop hand coding your ETL jobs? Here are 10 reasons why you should stop hand coding your ETL jobs, now.
1. Custom Code Means Custom Infrastructure
You have spent weeks building and fine-tuning your hand coded application, and it’s finally ready! Before you hit deploy, you’ll have to consider what infrastructure you’re going to use and whether you need a separate team to provision and manage this technology. You may also want to consider deploying a CI/CD pipeline so that your future releases are automated. Whatever the choices are it results in more work which means, actually, you can’t run your application yet. You’ll have to park it whilst you provision your infrastructure. Well that’s not fun, is it?
It is also a common practice to replicate the infrastructure into different environments in order to separate work in progress from the production system. This immediately increases your expenses, therefore, naturally, you would start considering options which would reduce your outgoing with the trade-off being that your replicated environments are no longer identical.
2. Custom Code Needs Optimizing
Success! Your infrastructure is now in place with your “replicated” environments. You deploy your application into the development environment and run it. It works but the likelihood is that it’s not utilizing the full potential of the machine; this is more common with hand coded ETL applications. So then you’ll have to spend time tweaking the deployment parameters. This may include memory usage, parallelism, cores to use per task/node etc.
Great! After some hard work you’ve found the optimal running state and your application is running at it’s best in the development environment. However, you then realize that you’ll have to repeat the process when you come to productionize it, simply because you’re production environment holds a much more powerful group of infrastructure.
The reality is that it’s easy to underestimate the amount of effort required when going through the custom code process. This has resulted in a lot of big organizations under-budgeting their ETL projects and realizing that too much time is being put into the ETL stages and not enough time being put into bringing value back to the company.
3. Custom Code is Expensive
What does all this mean? Custom code is expensive.
Already have an established team of developers doing all your coding by hand? This might seems like the most cost effective model for developing code since it is already in place. And you might be right. Organizations can save up 20% on short-term costs by hand coding. However, this may just mean you have a low ‘entry cost’ for your code. If we look at the total cost of ownership you might be surprised.
The maintenance of hand-coded pipelines can increase costs by up to 200%, which includes testing, debugging and updating code as needed.
We also hear from more than one customers that they had an expert coder who wrote all their bespoke code and then left the company leaving them in a situation where they are not able to make changes or update code. This obviously negatively impacted reporting and analytics, prompting their search for an ETL tool.
4. ETL tools are affordable
“Okay, so my custom code might be expensive, but it has to be cheaper than an enterprise ETL tool!”
That’s not necessarily the case. The ETL tool landscape has changed quite drastically in recent years. Many ETL tools are adopting pay-per-use models as is now common cloud data warehouses. Furthermore, the nature of ETL tools contribute to faster development cycles saving time and long-term costs.
Many ETL tools offer a visual component that validates your job as you build it or update it. This means you don’t have to wait until runtime to see if it works. With a GUI-based ETL tool, you’ss have confidence before you hit ‘run’ that your job will be successful, reducing the time it takes to debug bad code.
The cost of an ETL tool might not be as much as you think. Microsoft TechNet reports that “ETL development is often underestimated and can turn out to be more complex than imagined.” This necessitates larger teams to write custom code for numerous integrations. With each SQL Developer costing on average $83,898 per year in the United States (Indeed), additional resource can be costly! An annual license of Matillion ETL is less than $20,000 for an m4.large instance on AWS.
5. ETL Tools Help You Develop Jobs Faster
Hand coding takes a lot of time if you don’t have this guy on your team.
In all seriousness, by nature ETL tools are designed to increase the speed of development with pre-built data integrations making your SQL commands easy to build out a complete end-to-end data pipeline. Furthermore, ETL tools can create collaborative work environments where team members can work at the same time and control development with annotations and version control. This can be especially beneficial if you have a big project supported by a large team.
It is also possible to build ETL jobs or a series of jobs which are parameterized and therefore can be reused in other parts of the ETL. This saves writing duplicate code and means updates to business rules only need to be made in one place.
If you are able to speed up your development pipeline, it means your company is better able to get the right data to the right people, at the right time. Data accessibility and the ability to self-serve bespoke data needs across your company empowers data-driven decisions to be made.
6. Foster Greater Data Understanding Across Multi-Disciplinary Teams
Technical users, often with many years of experience, are usually the ones that write the code and understand it. However, the end users of the data are not usually the code authors. They could be C-suite executives, digital marketers or BI specialists. While these individuals will have a strong understanding of the data and how to interpret the result, they may not be able to easily understand the transformations the data has gone through or how this is written in code.
If the data needs to assessed and understood by a wide and diverse audience, an ETL tool can help these individuals understand and critique the transformations to ensure the data is being appropriately handled. ETL tools like Matillion with an icon-based, graphical interface and drag and drop functionality can help developers communicate complex ETL jobs in a straightforward and visual way.
The days of ringing Tech Support you can’t get a hold of to request new reports or changes are over. Now data users across the company can be ETL-literate without SQL training!
7. ETL tools are resilient
Hand coded ETLs can often work well for transformations. However, hand coding proves to be less resilient for loading your external data. Alternatively, an ETL tool will provide more resilience to a moving target.
For example, if you want to load data from Facebook, you will need to use the Facebook API. However, many APIs, especially social media sites and Google based APIs are regularly updated for security. Each of these changes which will require an update to a hand coded data load job. An ETL tool, on the other hand, will manage these updates for you. Each update will be quicker, require less work and provide a more seamless process to upgrade.
8. No One Likes Producing Documentation
If you are anything like most developers, documentation is a pain point. Documenting each hand-coded ETL job is time-consuming and comes with a huge overhead. Writing the documentation either during or after writing the code will increase development time two-fold.
Many ETL tools now provide some form of self-documentation for both the data flow in the ETL and the data lineage for outputs. In the most basic cases, this is just a visualization of the jobs built, but this will still mean more to the business users than a copy of the handwritten code.
Matillion takes this one step further and offers auto-documentation which provides a comprehensive step-by-step guide that captures notes/annotations. There is also a data lineage feature customers can use to map where data has come from.
9. ETL tools can be Future Proof
Writing code for a particular data warehouse needs to be in the language specific to that system. Most ETL tools, however, don’t do this! They are generalistic to work with many data warehouses. This means that each set of code written is specific to each individual data warehouse. If you ever want to change your target system, you will need to re-engineer all your code. Unless you have an ETL tool that supports portability. Make sure you select an ETL tool that is future proof.
The principle is the same for the data sources. To hand code a pull from a source system will often require logic specific to that source system. However, an ETL tool may have a component for this source system so changing source in an ETL tool is just a case of swapping out the components rather than completely rewriting the hand-coded logic to do the data pull.
10. ETL Tools Do More Than Just ETL!
ETL tools, for example, can also help you with unit testing. When custom code is run it either succeeds or fails as a unit. This makes it very difficult to identify exactly where and why the failure occurred. To investigate takes experienced developer time and effort to break the code down or for the code to have error handling at every stage.
An ETL tool, on the other hand, will break down every step of the logic so you can visually, quickly, and easily identify what has gone wrong and where. Some ETL tools will also tell you the values of any parameters at runtime which helps identify any issues with dynamic code.
Do you have questions about how an ETL tool can help your company save time and money? Book a Matillion Demo today and speak to a Solution Architect.