When big data first emerged, it was defined by three things:
- The sheer volume of data
- The velocity at which it is produced
- And the variety of data types and sources
But as data and business has begun shifting to the cloud, the 3 V’s are giving way to a new concept: The 3 D’s.
The 3 D’s: Distributed, diverse, and dynamic
The digital transformation that is largely happening in the cloud is changing the way we think about data. Instead of a monolith of big data living in a data center or on-premises, the cloud has brought about a more fluid construct, where we see data as distributed, diverse, and dynamic. Where data comes from, where it lives, and what we do with it is constantly shifting and evolving.
The people who use data are also more distributed, diverse, and dynamic. Remote work is now the norm. According to an IDC survey, the pandemic accelerated a move to the cloud for 68 percent of businesses (1). We are leveraging data from more sources than ever in databases, cloud data warehouses, and applications. We are rapidly creating, consuming, and moving data between platforms. And there are more people across the business using data than ever: frontline workers, knowledge workers, managers, and executives all say that data analytics insight affects their decision making.
The 3 D’s can create a storm of data and activity that can swamp a business that’s unprepared. It can also create a wave of opportunity, the crest of which we can ride to greater insight and innovation that transforms the business. Here’s how to stay on top of the 3 D’s of data and turn them into an asset for your enterprise.
Distributed data: Compounded and replicated across the ecosystem
Here’s a breathtaking number: We created 64 zettabytes of data in 2020 (2). How much is that? According to Stewart Bond, Director, Data Integration Software at IDC, if you streamed video on your laptop 24 hours a day for 57 million years: That’s 1 zettabyte (3).
But almost as astounding as that number is where the data is located and what it looks like. Ninety percent of that data is generated by replication (4). There are typically multiple copies of data created in operational systems that are saved to data warehouses and data lakes for analytics and reporting, data science and ML/AI. The number of data sources in the cloud is also exploding. Every day, someone spins up a data silo that didn’t exist 12 months ago and you’re asked to pull in that data for analytics.
The solution: Centralize distributed data
One of the most critical ways to get a handle on widely distributed data is to centralize it, and the cloud is becoming the preferred destination. Enterprises need really easy, frictionless ways to get data into the cloud. This has contributed to the rise of quick, accessible pipeline builders like Matillion Data Loader. No-code, wizard-based pipeline creation helps data teams rapidly load data into the cloud from data sources both in the initial migration stages of the cloud data journey and on an ongoing basis. And Matillion ETL offers the ability for data teams to create their own API connectors to virtually any data source, without relying on vendors to provide them or engineers to hand-code them.
Dynamic data: Moving data where it’s needed
Data teams are collecting, creating, and working with data across a vast ecosystem that includes the cloud, on-premises data architectures, and a variety of applications. Not only that, some data, like IoT data, appears so quickly and in such volumes that it’s not only hard to leverage but becomes out of date unless acted upon quickly. Much of this data is replicated, and only a fraction of it gets stored (1 to 2 percent of the total, according to IDC (5).) Data teams need the ability to move data where they need it and prepare it for analytics without wasting time.
The solution: Team agility and data integrity
When data is dynamic teams need to be as well. That’s where low-code tools that foster agility come in handy. Any tools that help teams collaborate, see logic visually, and standardize data processes are invaluable to a modern data team that’s moving data across systems and clouds. For example, reusable components and the ability to sample live data at any stage in the data integration process tremendously speeds development of pipelines and enables data teams to be agile and dynamic in response to challenges in the cloud.
It’s also important as data gets shared and used across systems to ensure that it retains its integrity. A well maintained single source of the truth, automated processes, error handling, and audit logging are all features that can help preserve data integrity, even in the face of dynamic conditions.
Diverse data: Many parts of a whole
According to IDC, two thirds of organizations are integrating up to three different types of data, many up to four, five, or six (6). And all of that data is going into nine different data management technologies that survey respondents identified, including: Analytical and relational databases, streaming video, the mainframe, object stores, and of course, the ubiquitous spreadsheet (7). Turning heterogenous data into something usable for analytics and insight generation while still maintaining its integrity is a real challenge.
The solution: Data transformation that can be flexible, depending on the need
When dealing with data diversity, the role of data transformation becomes even more critical. Homogenizing and harmonizing data from anywhere and everywhere helps ensure that the output you get from analytics is truly accurate and meaningful. And for truly diverse data, there’s no one right answer for how that happens. The best solution is an ETL platform that’s ready for anything: Using pre-built logic blocks and automation to speed productivity, while still allowing for customization and unique use cases where required. Data teams can work with everything from common transformations to edge cases with the end result being faster, more productive work and better quality data.
Learn more about mastering the 3 D’s: Get the IDC Technology Spotlight
Want to learn more about tackling the 3 D’s and the other data challenges familiar to data teams today? Get the latest IDC Technology Spotlight, Calming the Storm with Cloud-Native Data Integration, by Stewart Bond, Director, Data Integration Software at IDC to see how cloud-native data platforms can help you make sense out of distributed, diverse, and dynamic data.
(1) “US-Cloud Database Migration and Architecture Survey,” IDC, 2021, N=406.
(2)“Worldwide Global Datasphere Forecast 2021-2025,” IDC, March 2021.
(3) “Calming the Storm of Cloud Data Management”, Matillion and IDC Webinar, April 2021.
(4)“US-Cloud Database Migration and Architecture Survey,” IDC, 2021, N=406.
(5) “Worldwide Global Datasphere Forecast 2021-2025”, IDC, March 2021.
(6) “Data Ops Survey,” IDC, 2021, n=401.
The post Goodbye 3 V’s, Hello, 3 D’s: Tackling Distributed, Diverse, Dynamic Data appeared first on Matillion.