One of our highlights of AWS re:Invent 2020 was Dave Langton’s presentation, “Improving Analytics Productivity for Overwhelmed Data Teams.” Today’s data teams struggle with what we call the the “Three Vs” of modern data:
- Volume: This year, Matillion customers loaded 5.4 trillion rows of data per month into cloud data warehouses. And that number continues to go up.
- Variety: Enterprise organizations used an average of 1,080 data sources in enterprise analytics (IDG, 2019).
- Velocity: Business is moving faster than ever. In the stock market, an RPA bot takes only 5 milliseconds to respond to world news events with a trade. Organizations need to act on information as close to real time as possible, which means that information has to be available and ready for analytics.
Trying to keep up with the Three Vs of data is enough to swamp even the largest of data teams. Data engineers, architects, and analysts across businesses and industries are overwhelmed and looking for ways to get relief, but still keep work moving.
Data teams must improve efficiency to keep up
For data teams to continue bringing data in from various sources and making it analytics-ready at the pace that the business demands (i.e., as soon as possible), the most effective strategy is to find efficiencies that speed up work: minimize the time it takes to do more “custodial” data tasks like coding pipelines, do more work in parallel, and productionize workflows so that multiple team members can step in at any time.
The right tools can help you improve data team efficiency and analytics productivity to not only reduce the workload of your teams, but also help people across the organization get to insights faster. Here are five ways modern data teams can begin to move at modern speeds.
1. Reuse and borrow to create repeatable processes
If you run a data analysis that yields useful knowledge for end users, they are going to want you to run that analysis again. And again. If you can build repeatable pipelines and processes into saved jobs, that’s going to be immensely helpful to your future self. In coding, one of the first things developers will do is look at libraries that they can impart and reuse. Data tools should offer that same functionality and support the same mental model.
Rely on Shared Jobs
If you’re going to build a data dimension, you are definitely not the first person in the world to do that. You don’t need to start from zero. Someone has likely created a code block or a job out there that you can import to save time and effort. At Matillion, we call these Shared Jobs, and we have a whole library of them on Matillion Exchange.
Below is an example of a Shared Job to build a schema detector.
This job didn’t ship with Matillion ETL; it was created in Matillion by one of our users. It’s shared, it’s repeatable, and there are dozens of Shared Jobs like it that will make your team more efficient.
Slack saves time with repeatable processes
Slack, a Matillion customer, used repeatable patterns to reduce the number of discrete workflows they had from 10 down to just one. By streamlining and productionizing efforts, they were able to reduce the time needed to generate new reports from six hours down to just 30 minutes. Those aren’t incremental improvements–they’re game changers for the company.
2. Lean into visual design and self-documentation
Take a look at the following two images:
On the left is a SQL script for a job. On the right is the same job in Matillion ETL.
Is it faster to build a simple job in Matillion ETL? Maybe. It’s relatively easy to write a SQL script like the one above in a short amount of time. But think of your future self or your colleagues looking at that script a month or a year later. If you pass it on to another team member, they have to read it to understand it. Deciphering a script after the fact is often more difficult than writing it in the first place. And when many tables are required to produce the right result, SQL gets complicated really quickly.
See your business logic
A visual diagram is readable from the beginning, and more readable over time. You can see the overall flow of the job and the objective. Any member of your data team knows where to begin, and what questions they need to ask. Taking the bulk of the guesswork out of a SQL script by having a visual guide to processes and business logic can be a giant step toward greater efficiency.
3. Unleash the Cloud
If you’re not working with data in the cloud, you’re missing out on a major opportunity to modernize how you work with data and move faster. It might be possible to take advantage of massively parallel processing in a traditional, on-premises data architecture, but a traditional data warehouse can’t match the near-infinite scalability of the cloud. And it certainly can’t match it at an affordable price.
The cloud is faster, more scalable, and more affordable than traditional data architectures. Add in cloud data warehouses and cloud-native tools like Matillion that are built to take full advantage of the speed and scale of the cloud, and data team productivity can skyrocket.
Reduce time to insight significantly in the cloud
Several Matillion customers have seen huge speed gains in the cloud. Docusign reduced its ETL run-time by 72 percent in the cloud. The San Francisco Giants, working with Matillion partner Data Clymer, reduced time to new insights by 50 percent.
4. Leverage the Lakehouse architecture
Different teams use data differently. Data scientists are likely to pull the data they need from a data lake, while data analysts and engineers work within a data warehouse. They are working in two different environments but duplicating data and processes, which creates extra work.
Consider this diagram:
This more traditional architecture shows a split, where one group goes off to a data lake to do data science and the other is working within a data warehouse environment. Why aren‘t their needs met using a central data team and data location?
Enter the Lakehouse
Here is a more modern approach, the Lakehouse:
In this scenario, you load data once, you apply transformation and clean up the data once. Then you make sure all data teams have access to that nice, clean data, whether they’re doing modeling, reporting, or any other activity. By consolidating data in a Lakehouse, you’re consolidating work and making it possible to speed up analytics for faster time to insight.
5. Choose tools that foster collaboration
Any tool or platform that enables collaboration is essential for working efficiently within data teams. Collaboration can mean different things. It could be working independently on parts of projects that will be combined later (for example, using Git). Or it can mean collaborating in real time within a shared workspace. Ideally, you want a tool that supports both collaboration types (as Matillion does.)
Ready to move faster?
These are just a few ways you can boost your data team’s productivity with modern, cloud-native tools and repeatable processes. Watch David Langton’s session at AWS re:Invent 2020 to hear more of his insight on improving efficiency. (Registration is free.)
See how Matillion can help your data team improve efficiency
Ready to learn how Matillion can help your data team become less overwhelmed and more productive? Request a demo.
The post 5 Ways to Improve Efficiency for Overwhelmed Data Teams appeared first on Matillion.