Our customers have special data storage requirements, especially around volume and variety (unstructured data). To meet their needs, they are employing data lake architectures.
A data lake can store data in any volumes and make it available without needing to transform it before being ready for analysis. This ability can address unique requirements by providing a cost-effective resource for scaling, storing and accessing large volumes of diverse data types. Data lakes, however, do not come without their challenges. In this blog, we outline common challenges and the 4 Pillars of Governance that can help your business overcome these.
Register for our webinar “Designing a Data Lake – Lessons Learned from Experts” and attend live on April 2ns, 2019 at 11 AM EST or watch on-demand after the webinar, to learn more about data lakes from experts.
Data Lake Challenges
Although a data lake is a great solution, it does not come without its challenges. Looking again at how we define a data lake: allows for the ingestion of large amounts of raw structured, semi-structured, and unstructured data that can be stored en masse and called upon for analysis as and when needed. We can see this definition carries inherent risks and can lead to the dreaded Data Swamp which many organizations have fallen prey to.
PwC quote Sean Martin, CTO of Cambridge Semantics as saying “We see customers creating big data graveyards, dumping everything [into the data lake] and hoping to do something with it down the road. But then they lose track of what’s there. The main challenge is not creating a data lake, but taking advantage of the opportunities, it presents.”
This observation is reinforced by research from Gartner, who warn “the data lake will end up being a collection of disconnected data pools or information silos all in one place…Without descriptive metadata and a mechanism to maintain it, the data lake risks turning into a data swamp.”
4 Pillars of Data Governance
To mitigate these risks, build a governance layer into your architecture. The 4 Pillars of Governance serves are the answers to the following questions.
- What data do you have and where it is stored? (Data Catalog)
- Where has data come from and what has happened to it? (Data Lineage)
- Is data accurate and fit for purpose? (Data Quality)
- Is data protected from unauthorized access? (Data Security)
This governance layer is a combination of process and tooling. While these may increase the total cost of ownership (TCO) of a solution, they are more likely to result in a return on investment (ROI) more likely.
To learn more about data lakes and how to incorporate data governance into your data lake architecture, register for our webinar “Designing a Data Lake – Lessons Learned from Experts”.