- Blog
- 08.02.2018
- Data Fundamentals, Dev Dialogues
Deployment Options in Matillion ETL - Using Multiple Projects
With this method, you create multiple Projects. For example: one Project for Development one Project for QA one Project for Production You then promote code between Projects when it has been developed and tested. |
---|
Orchestration and Transformation Jobs
Matillion is an ELT tool. It does two main things on your behalf:- Data ingestion - in other words loading data from external sources into the database
- Data transformation - getting your data ready for visualization or analysis, for example, integration, reshaping, applying business rules, aggregation, and deriving calculated fields.
Matillion Projects
A Project is simply a collection of metadata. When you’re working in Matillion you are always within the context of a Project. When you log in, it’s the first dialog that pops up after the login credentials. You’ll need to select a Group, a Project and a Version. In the above screenshot, these are:- Group: Matillion
- Project: Development
- Version: default
- Orchestration and Transformation Job definitions
- Folder structure
- One or more Environments
What’s a Matillion Environment?
A Matillion Environment defines a target data warehouse. To manage your Environments you’ll need to expand the panel at the bottom left of the screen, which is minimized by default. When you first launched Matillion, you went through an initial configuration screen. This asked for details of the target data warehouse plus a couple of other configuration items. For this reason, you’ll always have at least one Environment. You can manage environments through the right-click context menu. The Edit Environment option will take you back to that initial configuration screen, where you can change the settings if necessary. You can add, edit and remove Environments using these options. Note that:- Matillion has an overall cap on the total number of Environments that may exist within your installation. The cap depends upon the instance size. When you use multiple Projects then each Project will have at least one Environment, and they all count towards the cap. You’ll need to switch between Projects to count the total number.
- You are not permitted to delete the last Environment within a Project.
Solution Overview
With this code deployment option:- You define multiple Projects (development, test, production)
- Every Project owns one corresponding Environment (development, test, production)
- You copy the code from one Project into another once it’s ready to be promoted
- A “Development” project, which owns an Environment named “Dev”, pointing to the Development target data warehouse.
- A “Production” project, which owns an Environment named “Production”, pointing to the Production target data warehouse.
- Dev → Production
- Dev → System Test → Production
- Dev → System Test → UAT → Production
Exporting and Importing Jobs
Matillion’s mechanism for copying job metadata from one Project into another is via the Export/Import feature. While editing a Job, you can find the Export option from the right-click context menu: You can also get to the same dialog by following the Project / Export menu. The dialog allows you to select one or more Jobs you want to save and download as a JSON file. You can put this file into external version control. From the Project / Import menu, you can import a Matillion-generated JSON file, and choose which jobs to import into the target Project.
By default, the jobs will be added into the same folder structure as they were at the source. You’ll find it easiest to maintain exactly the same folder structure in Development as in Production. |
---|
A note on Environment Variables
Environment variables have a very useful feature in that you can set a different default per Environment. While logged into the Development Project, you might have a variable named target_schema. Its default value (for the Dev environment) is set to dev_schema. Similarly, while logged into the Production Project, the same variable can exist, but with a default value (for the Production environment this time) set to prod_schema. So, instead of hardcoding the schema name of a Table Input component, for example, you could set it to ${target_schema} Then, in Development the Table Input would automatically read from the dev_schema, and once deployed into the Production Project it would automatically start to read from the prod_schema with no code change required. Incidentally, this is why when you export an Environment Variable, the default value is not among the properties that are saved into the JSON export file.Summary
When you use multiple Projects to manage code deployment, you set up multiple parallel copies of the codebase in a different Project, and each copy pointing to a different target data warehouse (dev, test or production). You control exactly when and what code you promote to the different environments. Also, you can take advantage of the Environment Variable feature which allows you to automatically customize job behavior according to where it is running. Backup mechanisms work as normal and are still recommended. Be aware of Matillion ETL's cap on the total number of Environments that you can create because this applies across all projects simultaneously. Some metadata items, including user logins, OAuth credentials, and the Password Manager are not Project-specific. These are server-wide and automatically apply to every Project. More on this subject in the third article in this series.Other Methods of Code Deployment
Using Multiple Environments Using Multiple InstancesBegin your data journey
Want to see Matillion ETL in action? Request a free 1hr demo today!Ian Funnell
Data Alchemist
Ian Funnell, Data Alchemist at Matillion, curates The Data Geek weekly newsletter and manages the Matillion Exchange.
Featured Resources
The Business Value Artificial Intelligence Adds to Data Pipelines
As Artificial Intelligence (AI) continues to grow into a more and ...
Learn more BlogThink Outside the Container: Hands on with Snowpark Container Services and Matillion
Matillion and Snowpark Container Services provide a secure, flexible platform for data engineers to integrate AI into data ...
Learn more BlogMastering Git at Matillion - Understanding Hard Reset
A hard reset in Git is an operation that allows you to return your local branch, including commit history, to a specific commit.
Learn more
Share: