Videos

Incremental Data Load in Matillion ETL for Amazon Redshift

In our latest Amazon Redshift tutorial video we walk you through the process of setting up incremental data load in Matillion ETL for Amazon Redshift. This is useful in data staging, when you only want to fetch records since the last time you ran the job.

In this video we use the example of a Salesforce data load, but the same techniques are applicable for any data source you would like to use.

Incremental data load

The starting point for this tutorial is with a simple data load. In this tutorial we are using an example job which stages the cases from Salesforce into a target table in Amazon Redshift.

After checking that this job runs, we must then treat this as a staging table in order to make this an incremental load.

 

matillion-etl-for-redshift-incremental-data-load-salesforce

 

We then need to create a transformation job to do the update of the master table. This is a straight update from our staging table into a table update.

 

matillion-etl-for-redshift-incremental-data-load-2

 

We need to keep track of the last modified date. To do this we once again load the cases from Salesforce, but this time we are only interested in the lastmodifieddate column.

If we then do an aggregate of this without any groupings, this will give us a single row of output with the maximum value.

 

matillion-etl-for-redshift-incremental-data-load-max-value

 

We can record this value by using a create view

 

matillion-etl-for-redshift-incremental-data-load-create-view

 

Going back to our staging job, we can turn this into an incremental load by using a table iterator and the view that we just created.

 

matillion-etl-for-redshift-incremental-data-load-5

 

This means that when we do the load, we can do a data filter and only load data where the last modified date is greater than the one we already have present in the table.

 

matillion-etl-for-redshift-incremental-data-load-modified

 

This can then be used to update the master table

 

matillion-etl-for-redshift-incremental-data-load-master

 

If you want to test this, you can do so by changing a single Salesforce case and then running the job, which should return a single row loaded and a single row updated. If no changes have been made then this will return zero rows.

Matillion ETL for Amazon Redshift tutorial videos

If you’d like to find out more about Matillion ETL for Amazon Redshift, you can visit the product page here

You can also find more great videos like this on the Matillion Youtube channel. Here you’ll find lots of content, covering a whole range of topics. Everything from launching the tool on the AWS Marketplace, to creating your first job, exploring areas of functionality, new components and new releases.

If you’re already using Amazon Redshift, or even if you’re just evaluating tools, download our free eBook below for loads of useful advice and best practices on how to get the most out of your setup.
redshift-best-practice-ebook