While data volumes rocket, information accumulation lags
It’s hardly news that the world is awash with data—not just petabytes of it, but exabytes of it. And the rate of accumulation is accelerating rapidly: according to one estimate, 90% of all the data in the world has been accumulated during the past two years.
But if that is not exactly news, it’s not exactly new, either. Consider the following quote:
“Millions of dollars yearly are spent in the collection of data, with the fond expectation that the data will automatically cause the correction of the condition studied. Though accurate data and real facts are valuable, when it comes to getting results, the manner of presentation is ordinarily more important than the facts themselves.”
Nothing remarkable there, perhaps. But what will probably surprise you is that these sentiments were expressed over a hundred years ago.
That’s right: the words in question come from Graphic Methods for Presenting Facts, written by American information visualisation pioneer Willard C Brinton, and published in 1914. (Now out of copyright, it’s also freely available on the Internet.)
Brinton, like others who have followed, neatly sums up the situation that the world finds itself in. We have data aplenty, but data on its own rarely provides answers: instead, answers and insights come from analysing the data—which isn’t always straightforward.
Consequently, while data volumes are soaring, our insights and learnings from that data aren’t growing at the same rate as the data is.
The facts speak for themselves. Because if 90% of all the data in the world has been accumulated during the past two years, then it’s certainly not the case that 90% of the world’s knowledge has arisen during the past two years. Far from it.
In short, there’s an information bottleneck at work, and a failing of data productivity.
ETL in the slow lane
So how does this information bottleneck arise? What causes the drag on data productivity, especially at the Big Data level, where insights are seemingly failing to keep pace with the growth of the data itself?
To some extent, the answer lies in the limitations of present-day analytics and visualisation approaches. Advances have been made, to be sure—and are still being made, to the present day.
Consider the analytics language R and the vast set of analytics libraries in the R ecosystem. Consider new ways of thinking about and exploring data—box diagrams, sparklines, and so on. And consider recent developments such as in-memory computing, and cloud computing, which respectively extend the scope and reduce the cost of deploying analytics.
But while sexy business of analytics itself has seen development dollars, the far less sexy business of ETL has stayed in the slow lane. And yet, capable Extract, Transform and Load technologies are fundamental to the success of analytics technologies, especially at Big Data scale, and even more so when the requirement is for near real-time results.
Data that takes hours to load, in other words, won’t be providing insightful answers any time soon.
Simpler, better, faster
Which is why we at Matillion are proud of the role that we’re playing in battling the world’s information bottleneck with our Matillion ETL for Amazon Redshift ETL tool.
Fast-growing and disruptive, it delivers structured and semi structured data integration in the Cloud. An enterprise-ready data integration platform, it’s a hundred times faster than traditional ETL technology—as well as being easier to use, simpler to deploy, and significantly cheaper.
And delivered via the AWS Marketplace, Matillion ETL for Amazon Redshift has a rapidly growing portfolio of Fortune 500 and ‘born on the cloud’ hi-tech customers—the very businesses that are often at the forefront of exa-scale data generation.