Skip to main content

Evolving Cloud Data Platform Opportunities in 2021: The Lakehouse

Cloud data management is on the rise and enterprises are taking note. Snowflake’s IPO in the fall, along with the acceleration of data insights due to the pandemic, has increased the speed at which companies are implementing cloud data strategies. In a recent survey of 200 enterprise data professionals, we found that there was a desire for diverse cloud data architecture. 

  • More than one-third (38%) are already using cloud data warehouses (CDWs). Long term, 43% expected to have all of their data in the cloud, with the remainder planning to pursue hybrid models that leverage both cloud and on-premises data warehouses
  • While the use of CDWs is already widespread, only 16% currently use data lakes. More than half (56%) plan to use data lakes in the future, and another 26% are considering doing so
  • 57% will leverage a hybrid cloud strategy (on-premises and cloud) for data management, while 22% are planning a multi-cloud strategy, and 21% will use a single cloud provider to manage all their cloud-based data

There are a few emerging trends for cloud data infrastructure that will help shape the environment within enterprises for 2021 and beyond. The most prevalent of these  is the arrival of new platforms that blend two data storage solutions for better data access and faster time to insights.

Blurring the line between data lakes and data warehouses

An emerging trend for 2021 is the blending of data lakes and data warehouses. Databricks released their Delta Lake product, a fully-featured data warehouse built on a big data stack. Microsoft Azure released Azure Synapse, which brought together all of their big data and data warehousing technologies under a single brand. This market has evolved towards the lakehouse, an architecture that is the best of both the structured and semi-structured world.

In a data warehouse, storage formats are specific to the data warehouse and in some data warehouses, they are tied to the compute power. So the customer has a choice where they want to store the data, depending on the use case. Among other things, the lakehouse tackles this problem. In this architecture, all of the data is accessible to everyone in the organization, for building new data transformations and new visualizations, powering BI and AI, and also for advanced users to do deep machine learning on prepped data. Anyone that’s done deep machine learning knows that most of the work is in simple data preparation. The lakehouse makes that process much more efficient. The architectural paradigms have shifted and simplified over time. Now they require less effort to maintain and therefore, cost less to implement. The time to value is much quicker with this new approach.

Databricks has a best-in-class product that incorporates the lakehouse technology. Matillion ETL for Delta Lake on Databricks, a cloud data transformation product that is purpose-built for the lakehouse with Databricks SQL is available to orchestrate the data flows and transformations. Azure Synapse merges cloud data warehousing and big data analytics into a single service platform. That means that if you’re trying to assess whether you need a cloud data warehouse, a data lake, or both, Azure Synapse Analytics can bridge both of those needs for you. Together, Azure Synapse Analytics and Matillion ETL provide ingestion, transformation, and preparation of data for use with PowerBI and Azure Machine Learning for advanced analytics.

Don’t forget to focus on supporting technologies

In order to take advantage of the lakehouse concept, it is important to ensure that the supporting cloud data solutions including ETL/ELT, data cataloging, data governance tools are thoughtfully selected. 

To maximize your cloud investments, keep these tips in mind:

  1. Separate compute from storage for maximum productivity – platforms and systems that couple storage can be difficult to manage so be sure to consider alternatives. 
  2. Rent in the cloud, don’t buy hardware – Cloud compute shifts the power towards lean data teams and cloud scalability across all platforms that can rapidly respond to the demand for analytics.
  3. Choose neutral ETL orchestration platforms to prevent technology lock-in.
  4. Enforce strong transformation practices to achieve the best results on any architecture pattern.

Watch the lakehouse in action

Learn more about the lakehouse in this keynote presentation and demo from Big Data London, featuring:

  • Ed Thompson, CTO, Matillion
  • David Langton, VP Product, Matillion
  • Jamie Cole, Product Manager, Matillion
  • Ajay Singh, VP, Field & Partner Engineering, Databricks

The post Evolving Cloud Data Platform Opportunities in 2021: The Lakehouse appeared first on Matillion.