We’re thrilled to welcome back Holt Calder, Data Engineer at InterWorks, for another partner guest blog. Today, Holt talks about the four skills that every modern data team in 2020 must possess. For more tips on closing the data skills gap, download our latest ebook, Build a Modern Data Team.
In the recent blog post, “The Case for Cloud-Based ELT in 2020,” InterWorks’ Data Practice Director Brian Bickell makes a great case for what we call “Modern Cloud Analytics,” a repeatable schematic that aims to kill the trend where analytics projects fail to deliver true business value. At the center of Modern Cloud Analytics (literally and conceptually) is the cloud, but as organizations move to adopt this framework there is still a large unknown. How do you build the right data team in 2020 with the skills to deliver on this modern data strategy?
4 critical skill areas for the modern data team
We can easily pick apart the moving pieces of Modern Cloud Analytics to figure out exactly what skills are necessary to manage and deliver this architecture to an organization. The right data team in 2020 comprises individuals who possess the following characteristics:
- Core SQL skills for using and managing cloud data warehouses
- Comfort and ability to deploy and secure a cloud infrastructure
- Familiarity with frameworks for data orchestration
- Strong understanding of how to apply and document business logic through transformations
Core SQL Skills
When I work with a client who is trying to build a data practice, this is typically the first pillar we focus on. When I kick off an engagement I typically ask, “How comfortable is your team with SQL?” within the first phone call. Comfort with SQL could mean a wide range of things, but core SQL skills really translate into the ability to write queries and manage a schema effectively.
These skills could be gained from really any background. In data engineering, we typically see two walks of life:
- The former database administrator (DBA)
- The software engineer turned data junkie
Focus on the fundamentals
Obviously for that core set of SQL skills, the DBA will probably have a more relevant background. But often in the modern stack, a DBA will come with a lot of bells and whistles that aren’t always necessary. For example, in an organization using Snowflake, managing indices is a complex skill that is no longer necessary. So don’t let SQL skills be the only criteria you use when building a data team.
Skills like how to manage role-based access control, basic query optimization, and generally just how to write good SQL are key to growing a team that is capable of delivering interesting solutions. If you were to compare a data team to a basketball team, this could be like a fundamental dribbling drill that focuses on technique and form over flashy ball movement.
This part of the equation is something that aligns naturally with Matillion’s position in the Modern Cloud Analytics architecture. At the end of the day, Matillion ETL runs on a virtual machine that the customer is responsible for managing. Managing that VM includes software updates, securing the environment, and provisioning access to end users and data sources.
Secure access to data
Additionally, any teams hoping to take advantage of the scalability of cloud storage also need a strong understanding of how to secure access to that data correctly. Breaches in public Amazon S3 buckets have made news all over the world. Ensuring that your organization isn’t the next security headline has to be a focus. The modern data team is comfortable deploying and managing the process described above from end to end while following best practices for security and access.
Matillion ETL is a cornerstone of a successful cloud analytics stack. As such, the application can handle sensitive information like database logins, API keys, and key business logic. Security of these objects is monumental – and a strong data team can be self-sufficient in handling most of these requirements. The last thing you want is a run-in with network security for leaving your database credentials on a web app open to the world.
Along with securing cloud infrastructure comes the need for an understanding of how to design and decouple applications using cloud infrastructure. Matillion ETL itself is very lightweight and does an incredible job using the scalability of the cloud data warehouse to handle spikes in workload and larger processing requirements. Cloud infrastructure provides multiple entry points for optimization. Learning these entry points and understanding the most impactful ways to optimize a workflow are key traits to the modern data team.
(Hint: It isn’t writing indices.)
Orchestration is a core activity that is designed and executed by the data team. How do we move data? When do we decide data needs to be moved?
In the Modern Cloud Analytics stack, there are a few considerations that your team needs to understand in order to answer those questions effectively. Matillion ETL is a relatively fixed cost. The variable cost consideration here is how your cloud data warehouse is being used throughout this process. In a nutshell, we should be aiming to provide high value with each query we execute. In Modern Cloud Analytics, storage is cheap, and compute is expensive. We should maximize and minimize each accordingly. Orchestration tactics are one way we can achieve that.
3 critical concepts for the modern data team
Data teams need to be well-versed in concepts like idempotency, fault tolerance and error handling. Idempotency on the data team is fundamental to success. This idea means that no matter how many times we run a Matillion ETL job, it will yield the same result. A byproduct of this consistency is that an idempotent Matillion job doesn’t waste resources processing the same data multiple times if it doesn’t have to, which directly translates into cost savings.
Durability is a key trait among successful data teams, and fault tolerance is the foundation to that success. The data team should know how to take advantage of their tools to avoid failure if at all possible. Preventing the re-processing of data and downtime of data pipelines can mean a world of difference for your end users and your costs.
When fault tolerance fails us, error handling allows the data team to react quickly. Transitioning from the proactiveness of fault tolerance into a reactiveness that comes along with error handling is unavoidable at times. Errors should be simple to identify and logs should easy to access, logging errors and sending alerts depending on the SLA of the activity.
A modern data team understands how to define reactionary action plans when things go wrong and understands how to take advantage of tools within their stack to send notifications and log relevant information about these failures. Matillion ETL does an exceptional job tackling all three of these considerations, offering an easy export of job metadata, conditional orchestration jobs and an integration with notification services when that failure email needs to be sent.
Business Logic and Transformations
Rounding out the skill set of your modern data team is the ability to apply business logic, transform data, and document the process along the way. Matillion’s transformations are SQL-based and push a majority of the processing power down into the data warehouse.
Analytical SQL skills are key
In an environment using Matillion ETL, the data team in 2020 must have at least a few individuals that possess advanced analytical SQL skills. These analytical SQL skills are fundamentally different than the core management topics we discussed in the first section of this post.
Instead of understanding how to manage an environment, these analytical skills allow the data team to understand how to produce bespoke datasets using windowing functions, advanced filtering and complex joins. Matillion translates these skills into components that can be strewn across a canvas to define transformations. This vastly simplifies the transformation process; however, the fundamental knowledge is still necessary.
From business logic to a data model
These analytical SQL skills lay the groundwork for an environment to mature, opening up the need to transform this business logic into a true data model. Data teams at scale typically have at least one individual who is capable of building a traditional star schema or data vault model to store and present their reporting needs. The primary aspect of these transformations ties back into the very problem that Modern Cloud Analytics aims to solve – deliver true value to the business.
Building your modern data team
As organizations move deeper into 2020, the ecosystem is changing more than ever. While companies look around and ask themselves how to build the right data team, I am confident these principles and skills will help point their efforts. Modern Cloud Analytics really has changed the game and allows organizations to fail fast and to try new things – concepts that have typically been a barrier for analytics and data projects in general. Matillion will continue to be a pillar for cloud analytics. Using the tips above, you can build a team delivering value to the organization from within that tool. If you have any further questions about how to build out a data team of rock stars to tackle 2020 – and beyond – feel free to reach out to me on LinkedIn or via email. You can also reach out to the broader InterWorks team at our contact us page.
About the author
Holt Calder is a Data Engineer with InterWorks who has seen Matillion grow over the years. Throughout his time with InterWorks, Holt has seen and delivered numerous cloud based analytics solutions for clients all over the world, and has seen organizations tackle this problem of building the right data team with the right skills.