How to Create a Data Pipeline Using Matillion Data Loader
If you need to get data into the cloud quickly and cost-effectively, Matillion Data Loader provides an extremely low-friction SaaS platform for data integration. Both technical and non-technical users alike can use Matillion Data Loader to bring data in from a variety of sources by creating simple data pipelines.
How to create a data pipeline in a few steps
How simple is simple? In just a few steps and a few minutes, you’re ready to bring data into the cloud. Let’s take a look at how to create a data pipeline to load data into your cloud data warehouse using Matillion Data Loader.
1. Start the process
Once you’re logged in to Matillion Data Loader, you can add a pipeline. Click on the green plus sign on your home screen to start the process.
2. Select a data source
Once you click the plus sign, you’ll be asked to select from a variety of data sources, including Google Analytics, MySQL, and Salesforce. Choose your desired data source.
3. Select a destination
Now that you’ve selected your data source, you’ll want to select a target destination for that data. Choose a cloud data warehouse (CDW): Amazon Redshift, Snowflake, or Google BigQuery.
You’ll also be asked to name your pipeline.
Once you’ve selected your destination, you’ll need to configure it. To do this, click on the manage button to add your credentials. You will only need to do this once. For each additional pipeline, you will be able to select your pre-configured target form the drop-down.
4. Set up the connection to your data source
Before you can start loading data from your data source to your target CDW, you need to give your connection details to the data source. Depending on your data source, you’ll need to either supply your login credentials (a username and password) or create an OAuth.
To create an OAuth entry for the data source, choose the source from a drop-down menu.
You’ll also be asked to name the OAuth.
Then you can configure your OAuth.
First, click Log Into Service
Next, log into your account and grant Matillion access to your data. Matillion will do the rest of the work.
Finally, go back to connection details and choose the OAuth you’ve just configured. Then you’ll be connected!
5. Choose objects for your pipeline
Once you are connected to your data source, you will see a list of data objects or tables that you can include in your pipeline. These objects include information such as account information, contact information, leads and opportunities in Salesforce, or site activity in Google Analytics. Select which opportunities that you want to include in your pipeline. You can also load any custom fields you’ve created as well.
6. Choose columns within your objects
Once you choose your objects, you’ll see columns identified within those objects. Columns include details such as Name, Address, Company, and more. By default, Matillion Data Loader includes all of these columns in your objects by default. Click on an object to adjust the columns of information you want to load.
There will likely be columns of information that you don’t want to move into your cloud data warehouse (for example, you want names and cities, but not phone numbers.) You can select those columns and move them out of the right-side window. It’s also possible to multi-select several columns at once to move them.
7. Configure how you want your data to load
Once you know what data you want to move, you can configure how you want that data to load.
In this window, you can:
- Choose a staging area where your data will load
- Create and name a staging table for your data
- Choose a stage schema
- Create and name a target table for your data
- Select a target schema
- Choose whether you want data to load sequentially or concurrently
- Select your target distribution style
A lot of this information will pre-populate based on the target CDW that you choose. The Matillion Data Loader community has information on how you can configure Amazon Redshift, Snowflake, and Google BigQuery.
8. Schedule and test your pipeline
Finally, you’ll want to schedule how frequently you want your pipeline to run. From a dropdown window, you can schedule loading intervals in minutes, hours, or days, depending on your data and your workflows. In this window, you can also test your connection to make sure that your pipeline is working.
9. Monitor your pipeline
When you hit Finish, Matillion Data Loader takes you to a dashboard where you’ll be able to see:
- Your successfully created pipeline and any other pipelines you configure
- Your run history to see if pipelines are running successfully or not
- The number of rows of data you’ve loaded every day from your pipeline
You’ll see this dashboard every time you log in from now on, where you can click on your pipelines to see individual information.
And that’s how you create a data pipeline in Matillion Data Loader. Once you build a pipeline, Matillion Data Loader saves all of your credentials for your target warehouse and source details. So you should save time building subsequent pipelines.
If you have any other questions about creating pipelines and other Matillion Data Loader features, check out our Matillion Data Loader community site, where you can post your question and search for information.
If you’re ready to begin loading your data into the cloud, sign up for Matillion Data Loader today. The product is free – just create a login and you’re ready to go. Happy migrating!