To Extract, Transform, Load (ETL), or Extract, Load, Transform (ELT), that is the question. What is the future of ETL/ELT? Is ETL really dead? Is ELT the new ETL? Are they the same thing? What is the difference between the two? How to do I decide which method is the right method for my use case?
These are the questions we frequently hear within the data migration community and from our clients. While Matillion uses ETL and ELT almost interchangeably, there are some differences between the two, and with those advantages and disadvantages. On the whole, we prefer an ELT approach as this is provides our clients with the greatest outcome and added value. Register for our ETL vs. ELT: What’s the big difference webinar to learn more.
A previous blog article The “ELT lie” and how to get real benefit from ELT architecture presented the differences between the two methods which we will recap briefly.
What is ETL?
Traditional ETL tools extract data from its source systems, transform it in a separate environment, and then load it into a target database for BI/reporting.
- E – extracting data from source
- T – transforming (i.e. converting) the raw data into a format that’s suitable for reporting and analytics
- L – loading data into a target platform (e.g. a relational database)
The ETL engine is therefore a compute resource, and as such needs to be powerful enough to handle large amounts of data which are going to be transformed. Often “powerful” also means expensive!
As you would probably expect there are some limitations with the traditional ETL workflow. Namely, the environments running ETL software are not built to scale in the same way as data warehouses now are. Therefore, when data volumes increase and workloads become more complex these environments consume more IT resources, create bottlenecks in the data chain and can negatively impact your reporting and analytics. The worst outcome – bad business decisions, being made slowly resulting in adaptability loses and missed opportunities.
However, it is not all necessarily doom and gloom. If you’re still using an on-premise infrastructure, your data is predictable, coming from only a small number of sources and your transformations are minimal – ETL could still be a legitimate cost-effective strategy. We suspect that for most modern companies however, this is not the case.
What is ELT?
So what options do you have if you have a lot of data from numerous sources and you want to shape, clean, filter, join and transform that data?
“ELT”, as opposed to ETL, extracts data from source systems, loads it in its raw form into a target platform, and then allows you to transform it in-database. This makes all data accessible and easy to audit.
ELT leverages the power of the data warehousing platform itself to perform transformations, and get the data into an analytics-ready format. Matillion pushes down all transformations to the data warehouse itself, and means that you only need one powerful piece of infrastructure. Further advancements on relational databases, such as Amazon Redshift’s new Spectrum functionality, makes transforming your data in-database easier, faster and more cost efficient. This design results in savings on infrastructure, better performing workloads, and shorter development cycles. Your data is quickly migrated and immediately available for transformations and analysis based on current business questions and needs.
Also it means you don’t need to know how you are going to use that data from the start. You have the freedom to transform at a later stage once its use case becomes more clear. This ability is increasingly appealing given the changing nature of development with the rise of iterative Agile methodologies. Thus ELT, may be more aligned with the current technical mindset.
So ELT is the best option?
The answer to this question is not a simple ‘Yes’ or ‘No’. ELT has some obstacles that you shouldn’t ignored.
Since your transformations are being done in-database, you will need available space and compute power to perform the desired transformations. Without this performance and queries will suffer. Cloud-based platforms such as Amazon Redshift, Snowflake and Google BigQuery, however, facilitate scalability in a cost efficient manner that helps address this challenge. As previously mentioned, continuous advancements such as Spectrum for Amazon Redshift, make loading your data and then transforming it even easier and faster!
Another problem we often hear about is the timely and labour intensive process of script writing. This is not just for loading data, but transforming it. It applies especially when dealing with unusual or unstructured data types, or where access is not simply file-based. The more data, data sources and needed outputs can make these complex opening them open to human error. It could take days or weeks to script, run and rollback in instances where mistakes are made. Luckily, there are a number of tools on the market that make this process quicker and easier, such as Matillion.
Matillion for Amazon Redshift, Snowflake and Google BigQuery use an ELT approach to mitigate the historical constraints associated with ETL. Offering an alternative to ETL, Matillion allows users to pull data from multiple sources into one location for analysis. Matillion’s wide range of connectors help to mitigate the issues with unusual and unstructured data by mitigating the need to manually code script. Using an ELT tool allows companies to benefit from superior scalability and cost efficiency. Matillion is fast and accessible to both technical and non-technical users given its intuitive design. This puts the right data is in the right hands at the right time.
So, Extract, Transform, Load (ETL), or Extract, Load, Transform (ELT)? ETL can still bring clients benefits in cases of extreme staticism. However, the changing data landscape is characterised by ever-changing user requirements, more data and data sources, increasingly complex queries/transformations and increased movement to the cloud. This means these use cases are dwindling, if not almost extinct. While we wouldn’t go as far as ETL is dead, its future is bleak in light of ELT advantages. If ETL looks to improve its biggest setback, performance, we would expect the result to be an ELT workflow.
Watch the full length webinar.