Meet Maia: The AI Data Automation platform that gives you the freedom to do more.

Visit maia.ai

How to architect data pipelines by following the Single Responsibility Principle

In this blog, you’ll learn how to architect reliable data pipelines in Matillion that are both highly maintainable and easy to debug by applying the Single Responsibility Principle to your data pipeline architecture. 

The Single Responsibility Principle is a guiding architectural principle originally presented to the software engineering community by Robert “Uncle Bob” Martin. Uncle Bob describes the practical application of the Single Responsibility Principle as “Gather together the things that change for the same reasons. Separate those things that change for different reasons.

At Matillion, all data work is done inside two types of data pipelines: Orchestration and Transformation Pipelines. 

These pipelines (particularly Orchestration Pipelines) are exceptionally flexible in the types of tasks they can handle. To reduce their complexity, making them more readable and maintainable, we’ll inspect a pipeline architecture driven by the Single Responsibility Principle.

Data pipelines for sales reporting

In this exercise, we’ll build a data pipeline that fuels a sales reporting dashboard. For that, we need a data pipeline that:

  • Loads raw data into Snowflake
    • Salesforce
    • Postgres database
    • JSON file in S3
  • Transforms and models raw data to prepare it for analytics
  • Runs every weekday, except on bank holidays (public holidays for US readers 😀).

Orchestrating data loading to avoid holidays

First, we will orchestrate the whole process by querying a Snowflake table that contains all of this year's dates that are NOT bank holidays. 

This way, if it’s not a bank holiday, we will execute our data pipelines. This pipeline should only change if we need to change our report scheduling.

Next, we have a pipeline built purely to execute the pipelines that do the work needed to refresh our sales report. This pipeline should only change if we need to add more steps to create our sales report.

Implementing the Single Responsibility Principle with data pipelines

Now, we are nearing the lowest level of responsibility. We have one Orchestration Pipeline to load data using a few load components. This pipeline should only change if we need to add more sources of data to fuel our sales report or change which data we collect from an existing source.

Orchestration vs transformation data pipelines

We also have one Orchestration Pipeline that orchestrates a series of transformations that will take raw data and create report-ready data via three separate Transformation Pipelines. This pipeline should only change if we need to change the underlying SQL logic that builds our sales report.

Conclusion

In this blog, we reviewed a data pipeline that’s built in a modular way following the Single Responsibility Principle. The individual modules represent pipeline requirements that would change for the same reason:

  • Scheduling
  • Order of operations orchestration
  • Data loading
  • Data analysis/modeling

If you’re struggling to build data pipelines that are easy to read and maintain, try Matillion! And be sure to use the Single Responsibility Principle when you do.

Angus Kinsey
Angus Kinsey

Enterprise Solutions Engineer

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.