Skip to main content

Better Together: Matillion ETL for Delta Lake on Databricks Now Available

 

Unlock your lakehouse potential with Matillion’s cloud-native data integration platform

 

 

With the arrival of Matillion ETL for Delta Lake on Databricks, data professionals across the organization can now leverage visually designed data transformations to build their lakehouse architecture and unify their data ecosystem.

 

These days, companies need a place to store structured data within the cloud, and a cloud data warehouse fits that bill. But often in a modern data ecosystem, companies also have to deal with semi-structured data coming from a variety of data sources that doesn’t fit into that tabular data paradigm. Nor does it necessarily need to be stored in a data warehouse if being used for applications such as machine learning. Modern organizations need a cloud data stack that comprises a cloud data warehouse and a data lake.

 

 

Databricks, Matillion ETL, and the lakehouse

 

Now organizations don’t have to choose between the two architectures, or even establish separate entities in the cloud. The lakehouse architecture offers the best of both the structured and semi-structured world. It’s a convergence of the data lake and cloud data warehouse environments. 

Databricks is a pioneer in lakehouse technology. And now, Matillion is excited to announce the general release of Matillion ETL for Delta Lake on Databricks to help address the rising demand for this new architecture. With Matillion ETL for Delta Lake on Databricks, enterprises using Databricks  now have a cloud data transformation product that is purpose-built for the lakehouse with Databricks SQL.

 

 

How does the lakehouse with Databricks SQL work?

A lakehouse combines structured and semi-structured data into the same query environment. 

Benefits of this approach are: 

  • A common work environment: The same datasets are available in the preferred work environments of every user across the organization (BI tools for Analysts, Spark for Data Scientists) 
  • Cost effectiveness: Infrequently accessed data can be stored in low-cost data lake formats without needing additional ETL for use in traditional analytics environments
  • Optimal Performance: The lakehouse can easily promote server datasets from more performant engines as and when their popularity rises across the organization, again without external data movement required.

The combined format lets you store all of your data in the cloud in a more cost-effective manner. When you put your data in, the lakehouse essentially decides where it lives, based on its level of structure. Not only does the lakehouse offer the best of both worlds, it simplifies collaboration between data engineers and data scientists because all data is contained under one roof. 

 

Where does data transformation fit in?

 

In short, the lakehouse is only possible with the help of a strong data integration and transformation engine that is able to not only access various data sources but also orchestrate the data flows and transformations across these data types . Transformations to typical enterprise data warehouse data models feed traditional BI and analytics tools such as Databricks SQL. And data engineering teams can easily productionize data science pipelines via self-documenting transformation workflows across a variety of virtualized tables.

Most exciting, the Matillion ETL platform allows data scientists to create easily repeatable workflows that can be productionized to create enriched datasets that are immediately available in the Spark execution engine below. This introduces the possibility of repeatable data science in a way rarely possible with other architectural paradigms.

 

Matillion ETL: At the forefront of lakehouse data transformation

 

 

With Matillion ETL for Delta Lake on Databricks, we continue to support our user communities by helping them democratize data ingestion and transformation across even the most cutting-edge cloud computing platforms available. 

 

Technology that fits with Delta Lake on Databricks and Databricks SQL

 

Matillion is unique in its ability to not only source data from myriad sources, but also transform that data into an analytic-ready structure on top of Delta Lake and ready for use in Databricks SQL. It’s the perfect marriage of data ingestion and transformation that paves the way for powerful machine learning, data science, BI and SQL across an organization. 

 

Supporting operational and business value

 

Matillion ETL for Delta Lake on Databricks gives Databricks users the ability to easily load data from numerous sources into Delta Lake without needing to hand-code pipelines. Using Matillion ETL’s graphical user interface, data professionals of all technical abilities can then transform data quickly and easily for data science and machine learning, leveraging the power and scalability of the Delta Engine, a vectorized query engine to exploit data optimization for modern structured and semi-structured workloads. Because we’ve purpose-built Matillion ETL for Delta Lake on Databricks, it seamlessly integrates with the platform and encodes numerous best practices to help customers quickly build out their lakehouse vision. Matillion specifically harnesses the power of the Delta Lake and the Delta Engine via platform-specific pushdown instruction sets, providing maximum pipeline performance. 

 

Repeatable pipelines and processes for streamlined workflows 

 

Both Matillion ETL and Delta Lake on Databricks champion repeatable patterns and processes. By combining the two technologies, repeatedly refreshing both datasets and the machine learning models they support has never been easier.  

 

Matillion ETL for Delta Lake on Databricks empowers developers to set the guardrails for no-code transformation jobs via reusable building blocks and parameterized Shared Jobs. This supports data engineer productivity, allowing them to build reusable data pipelines and self-documenting visual transformation workflows that are easily replicated and customized by a variety of  business audiences. 

 

Complementary to Spark

 

The beauty of the lakehouse is that it brings ACID compliance and standard data warehouse features to a traditional big data stack. This allows data scientists and engineers to continue using tools like Spark in their native setting, while bringing a host of business applications to bear on top of that existing toolset. With Delta Lake on Databricks, the enterprise can finally envision a unified analytics data store addressing all of its business needs.

 

To take advantage of this new shared environment, transformation tools like Matillion ETL give business users a code-optional way to access and transform data into analytics-ready datasets needed in both advanced and traditional BI applications. Meanwhile, data scientists can easily build single-click refreshable models that are both reusable across the entire business, and immediately accessible in the Spark engine they know and love.

 

Matillion and Databricks, better together

To get started with your data journey, request a demo or register for a free trial, click here.

 

Watch this in-depth demonstration of the combined power of Matillion ETL and Databricks together:

 

 

 

 

The post Better Together: Matillion ETL for Delta Lake on Databricks Now Available appeared first on Matillion.