- Blog
- 08.29.2024
- Data Fundamentals, Company
Lacing up: An introduction to dbt Core and Matillion Data Productivity Cloud

The Matillion Data Productivity Cloud serves as the integration platform for users’ data, as well as the tools used to derive value from that data. Within that lane, any user can find native functionality to utilize their dbt processes within a Matillion pipeline. In this series of articles - Stepping into Data with Matillion and dbt - readers will be educated on exactly why any user should integrate dbt and Matillion Data Productivity Cloud together, as well as best practices and examples on how to do so.
Let’s start with why
Simon Sinek would be so proud. Any data-driven organization is in a race; this race is a journey to data productivity against the opponents known as outdated practices and lack of data management strategy.
It just so happens that Matillion serves as the data productivity platform for any data organization. More specifically, a user can find native ELT (extract + load + transform) functionality within the context of the Cloud Data Platforms Snowflake, Databricks, and Redshift. Matillion contains a long list of out-of-box connectors and also allows any user to build no-added-cost custom connectors to be incorporated into a pipeline. Once having landed data into the above-mentioned Cloud Data Platforms, transformation is achieved through the pushing down of SQL native to the cloud data platform via Matillion’s low-code/no-code Transformation pipelines.
I know what you’re thinking - But wait a second… dbt is used to transform data. Indeed! While Matillion provides users the flexibility to build low-code/no-code Transformation pipelines, users of those Cloud Data Platforms can also utilize Matillion as the thread to tie all data management processes together in an easy-to-understand visual format. And that’s where dbt comes in; contained in Matillion is a native dbt Core component, which allows users to unlock all the analytics engineering functionality to unleash on their data. This flexibility gives users the ability to “choose their own adventure”, using Matillion as the platform to centralize those choices.
High-level architecture
The above diagram gives a high-level view of Matillion Data Productivity Cloud architecture, with dbt most associated in the bottom section of the diagram with the Matillion or Customer-Hosted Agent. The diagram communicates the following as it relates to dbt:
- A dbt project has been already developed using either dbt Core or Cloud, and this project has been made available in a Git repository.
- At runtime of a pipeline, the dbt Core component in Data Productivity Cloud “syncs” the Matillion Agent with the Git repository containing the dbt files; more specifically, those files contained in Git will be stored locally on the agent, ready to have any supported dbt command run against them.
- The profiles.yml file, normally stored locally on a machine running dbt Core or within dbt Cloud, is created locally and out of view of the user. This compiled file incorporates the details of the Matillion Environment which details the Cloud Data Platform configuration. The dbt Core component within Matillion also contains a Global Configuration in profiles.yml property to allow users to add configurations to this compiled profiles.yml file.
- The dbt command(s) issued by the user in the component will identify which models or other functionality to run, and Matillion will facilitate the pushing down of SQL (or Python) to the Target Cloud Data Platform of choice. This will fulfill the same functionality as if run in applications native to dbt.
- dbt logs are presented to the user in the Designer UI, communicating the status of the running of models and any other functionality that was run.
Lace-up those kicks
Let’s get a simple working example of the dbt integration in Matillion. In this scenario, an owner of a fictional shoe company called Charlie’s Shoe Emporium wishes to build a pipeline to load their sales data for retail stores located in Florida. In this workflow, our user wishes to:
- Load shoe sales data from AWS S3 to Snowflake
- Select only a handful of columns from the dataset and parse the dates into years and months using dbt
- Materialize this new dataset as a view
- Create another view that aggregates this data to find the most sales by location, month, and year
By the way, this and any more complex examples referenced in subsequent articles can be found in the Labs directory within Best Practice Pipelines Github repository. Load the assets to follow along with this example in your own environment!
Load Charlie’s Shoe Emporium Sales Data
If following along, open dbt Parent Pipeline from within the Labs / Stepping into Data with Matillion and dbt / 1-Lacing Up folder. Without going too in-depth, the pipeline loads three files from AWS S3 to tables contained in Snowflake. A transformation pipeline is then used to join the tables and apply a LAST_UPDATED column set 5 days in the past using the Python Pushdown component. The resulting dataset is then loaded into a table called SRC_CHARLIES_SHOE_EMPORIUM_SALES.
The user runs the Load charlies_shoe_emporium data pipeline, then samples the data of the resulting SRC_CHARLIES_SHOE_EMPORIUM_SALES table using the sample of the last component in the Transformation pipeline.
Each row in the sample corresponds to a sale with related information on the location, ID, date of sale, product name, whether a review was supplied by the purchaser or not, and the qualities of the shoe purchased. Here are the first three rows in the resulting dataset:
| LAST_UPDATED | STORE_LOCATION | TRANSACTION_ID | TRANSACTION_DATE | etc... |
| 2024-07-31 | Coral Springs | 25D7BD66744 | 2024-03-04 | |
| 2024-07-31 | St. Petersburg | 34EB760E182F | 2018-04-02 | |
| 2024-07-31 | Jacksonville | 1F378FBC1A4E | 2017-04-13 |
dbt Models performing analytics on the shoe sale data
With the raw dataset loaded into the SRC_CHARLIES_SHOE_EMPORIUM_SALES table via Matillion, the user will put to use a dbt project within a subdirectory within a public Github repository to accomplish the above objectives. Note how the two .sql files make use of references - STG_CHARLES_SALES references the source, and SALES_BY_LOCATION references a relation to the model STG_CHARLIES_SALES. In order of dependencies, here is what is being accomplished using those files:
- sources.yml
- The user is pointing at the SRC_CHARLIES_SHOE_EMPORIUM_SALES table created and loaded via Matillion. Note the use of the environment variables for the database and schema properties.
- STG_CHARLIES_SALES.sql
- The first model to be created. Using the source indicated in the sources.yml file, the user is materializing a view containing 11 columns. A new column labeled STATE is being created with the value FL, and the year and month of the TRANSACTION_DATE columns are being parsed out into their own respective columns.
- SALES_BY_LOCATION_FL.sql
- User is performing an aggregation of the STG_CHARLIES_SALES view, grouping the data by location, year, and month. The materialized view will be ordered by the most sales as grouped by these three fields.
Run the dbt Models in Data Productivity Cloud
The dbt project has been reviewed, and now our user is now ready to orchestrate its running in Matillion! They click into the Lacing Up Getting Started with Matillion and dbt pipeline, and drag and drop a Run dbt Core component onto the interface.
Note that upon placement of the Run dbt Core component a property pane opens up on the right side of the UI. The setup of the component is very simple; there are two main categories of configurations that need to be filled out by the user - connecting to the dbt project in Git (Git URL; Git Branch; Git Folder Path; Git Username; Git Password) and any properties which need to be specified in the actual running of the dbt project (dbt Command; Map dbt Environment Variables; Global configuration in profiles.yml; Profile configuration in profiles.yml).
Before running any of the models, our user wishes to simply validate the setup of the dbt project. In doing so, they set up the properties as follows:
- dbt Command: dbt debug
- Git URL: https://github.com/MatillionPartnerEngineering/Stepping-into-Data-with-Matillion-and-dbt/
- Git Branch: kg
- Git Folder Path: blank
- Git Username: Since this is a public repo, any Github credentials can be used. User enters the username associated with their Github account.
- Git Password: User saves a Github Personal Access Token associated with their Git Username as a Secret within Data Productivity Cloud, and references it in this field.
- Map dbt Environment Variables - Remember how the sources.yml file in the dbt project contains two environment variables (DBT_TARGET_DATABASE; DBT_TARGET SCHEMA)? Here is where our user sets the database and schema where the SRC_CHARLIES_SHOE_EMPORIUM_SALES table is located.
- Global configuration in profiles.yml - blank
- Profile configuration in profiles.yml - blank
With the configurations set, the user clicks run to run the pipeline, which only consists of the Run dbt Core component. They receive a successful run and open the dbt logging, which shows the configuration of dbt running locally, as well as the Cloud Data Platform configuration.
Our user has validated the setup of the dbt project and its running within Matillion. They make a copy of the existing dbt component, connect the new component with a Success connector, set the dbt command to dbt run, and run the pipeline once again. This will rerun the dbt debug command, as well as run the models indicated in the dbt project. Note that the use of the Success connector determines that if the debug command were to fail, the pipeline would cease from running the dbt run command.
Success! After effectively validating the setup of the dbt project within Matillion, the user has validated that the models have run successfully. Here is that log returned from dbt once again:
Conclusion
Matillion and dbt make for one potent combination in the race to data productivity. Through the use of the native components in Matillion’s Data Productivity Cloud, the running of any existing dbt projects can be orchestrated and easily configured. This basic workflow also showed how multiple dbt commands can be strung together with embedded dependencies through the use of connectors.
But this is just the first leg of the race. In the second installment of the Stepping into Data with Matillion and dbt series on intermediate topics, the following will be covered:
- Loading and using custom dbt Packages
- dbt Tests and advanced application of dependencies using connectors
- Preparation for AI workloads
- Concurrent and sequential commands
- dbt node selection
Thank you for reading, and I looking forward to seeing you down the road!
Catch up on:
Karey Graham
Snr Manager of Tech Services - PS & Partners
Featured Resources
The Future of Data Belongs to the Bold: Why Being a Challenger Matters When Choosing a Data and AI Partner
Matillion has been named a Challenger in the 2025 Gartner® Magic Quadrant™ for Data Integration Tools – recognition that we ...
BlogAgents of Data: Preparing Organizations for Agentic AI
Agentic AI has gone from curiosity to core strategy in what feels like a matter of months. But while the technology is racing ...
BlogAgents of Data: Digging into Semantic Layers
Semantic layers have quietly powered business intelligence tools for years. Now, as agentic AI systems emerge, they're ...
Share: