Meet Maia: The AI Data Automation platform that gives you the freedom to do more.

Visit maia.ai

Collaborating on Data Pipelines using git in the Matillion Data Productivity Cloud

Seamless collaboration is vital for project success in data engineering. The Matillion Data Productivity Cloud harnesses the power of tight git integration to elevate team collaboration to new heights. 

Thanks to git’s robust version control embedded directly into its workflows, Data Productivity Cloud developers can work in unison with unparalleled efficiency. Real-time updates, conflict resolution, and streamlined branching processes ensure team members can focus on innovation without being bogged down by logistical challenges.

User access metamodel

To begin taking advantage of Matillion's git collaboration features, start by inviting your colleague to become a member of your Data Productivity Cloud account. 

Accounts can include multiple projects, and account members can be granted specific access roles for any project within the account. These roles, such as Admin or Read-Only, determine the actions the user is permitted to perform in each project.

Concurrency and collaboration with git

As a distributed version control system, git allows multiple developers to work on the same project simultaneously without overwriting each other's changes. 

Thanks to powerful branching and merging features, git enables developers to efficiently collaborate, integrate contributions, and manage conflicts while keeping track of the history of all changes. This seamless and organized workflow is built into the heart of the Matillion Data Productivity Cloud.

Data pipeline development work is most commonly done in a feature branch. Every branch contains a full copy of the code repository.

Let's say that user A makes modifications to their local copy of the repository. For simplicity, the Data Productivity Cloud automatically stages updates, so there's no need for a manual `add` action.

At a stable point of development, user A commits changes with a descriptive (or AI-generated) message.

With local changes securely committed, User A then performs a "push local changes" to upload their branch to the remote server, sharing their progress with team members.

Now, the focus switches to user B, perhaps in a different location or time zone. Upon being notified of new alterations, User B fetches these updates by invoking a "pull remote changes," aligning their local repository with all the latest contributions.

User B can view, test, edit, and even redevelop any changes made by colleagues while remaining in the same git branch.

Before fully integrating these modifications, user B can use `git diff` to clearly inspect disparities between versions. This comparison highlights lines of code that have been modified, added, or removed, offering a detailed overview and aiding in understanding the evolution of the project since the last update.

Branch merging with git

At some point, developers and testers will agree that the work is complete. An integrator can then oversee the merging processes by managing these collaborative modifications.

The administrator can incorporate the new development work into the main branch by running a "merge from branch" operation. Before doing this, they may require further process steps, such as approvals, and go down the path of opening a pull request for discussion and review. Pull request functionality is an add-on made available by git hosting providers.

Once the change is approved, final integration into the main branch can take place. This strategically aggregates the collective input of all collaborators, thanks to a seamless progression of development steps that combine to maintain project integrity.

Further reading

You have seen how Matillion's close git integration enhances productivity and helps to foster an environment where collaboration thrives.

Here are some more links to help:

If you'd like to run your own trial of the Matillion Data Productivity Cloud, start by going to the Matillion Hub and creating an account.

Ian Funnell
Ian Funnell

Data Alchemist

Ian Funnell, Data Alchemist at Matillion, curates The Data Geek weekly newsletter and manages the Matillion Exchange.
Follow Ian on LinkedIn: https://www.linkedin.com/in/ianfunnell

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.