Videos

Using Iterator Components in Matillion ETL to Load All Tables from an RDS Database

Using Iterator Components in Matillion ETL to Load All Tables from an RDS Database

Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.

Customers frequently ask “how do I bulk import all tables from the RDS database into Amazon Redshift?” This is common question for those migrating from on-premise databases to the cloud or from one cloud database to another.

There are two popular Matillion component which connect into databases and pull data out from one or many tables using a SQL select statement. The first is the RDS Query component for cloud-based databases and the second is the Database query component for on-premise databases. These components allow a user to run one query at a time. Either of these components can be coupled with one of the two iterator components in Matillion to bulk load all tables into an RDS database.

Video

Watch our tutorial video for a walk through of some simple examples of Iteration, using Matillion ETL for Amazon Redshift.

 

Fixed Iterator

We would recommend the Fixed Iterator component for bulk table loads when you do not want to migrate all tables from your existing database to your RDS.

When using the Fixed Iterator component, you give the names of the tables from the existing database.  The component will then copy the named tables over into your new RDS. It therefore accepts a fixed list of values to be iterated over. The iterator works by passing the fixed list into a Matillion Global Environment Variable. Any component connected to the iterator can use these variable values. Therefore, by passing in the values of the RDS tables into the iterator and linking this to an RDS Query component or Database Query Component, you can copy a series of tables from a database into Amazon Redshift.

The steps to build a job to achieve this are:

  1. Bring a Fixed Iterator into the job and link this to a variable.
  2. In the Iteration Values add the names of some tables to copy from RDS to Amazon Redshift.

 

Using Iterator Components in Matillion ETL to Load All Tables from an RDS Database - Iteration Values

 

  1. Connect an RDS Query component to the Iterator and add the RDS connection details including the endpoint, database and username and password.
  2. The SQL statement should be:
 SELECT * FROM ${table}

Where ${table} is the variable used in the fixed iterator.

  1. The Target Table should also contain the variable name so the target is not overwritten on each iteration loop.

 

Using Iterator Components in Matillion ETL to Load All Tables from an RDS Database - Target Table
When this job is run the iterations can be seen as the new tables are created in the Amazon Redshift database.

 

Using Iterator Components in Matillion ETL to Load All Tables from an RDS Database - Run

 

This works well when you know the list of tables. However, you can use a Table Iterator in conjunction with an additional RDS Query component to iterate through all tables that exist on an RDS database.

Table Iterator

We would recommend the Table Iterator for when you want to bulk load ALL tables to your new RDS from your existing database.

The ability to migrate your entire database makes the Table Iterator an advanced, powerful tool. The Matillion Table Iterator component iterates over a table, looping through rows in an Amazon Redshift table. The Table Iterator then produces one iteration per row, thus providing you with a list of all the tables in your existing database.

To use the component to create a metadata table copied from an RDS or on-premise database, which contains a list of tables. After, link to the next component and iterate through the list to create copies. This will get information about the table names which already exist in the database. In a MYSQL database this information exists in the INFORMATION_SCHEMA.TABLES table.

First, query the INFORMATION_SCHEMA.TABLES table using an RDS Query component to copy the names of the tables into a new Amazon Redshift table:

 

Using Iterator Components in Matillion ETL to Load All Tables from an RDS Database - Table Iterator

 

You can subsequently use a table iterator to iterate over the results of this table. The TABLE_NAME column is linked to the table variable value:

 

Using Iterator Components in Matillion ETL to Load All Tables from an RDS Database - Table variable value

 

Finally, connect this iterator to another RDS Query component as in the Fixed Iterator Step:

 

Using Iterator Components in Matillion ETL to Load All Tables from an RDS Database - Connect Fixed iterator

 

This will create a copy of every table on the RDS database.

Hybrid approach

Lastly, a third option for you is to use the Table Iterator with the Fixed Iterator. Use the first component in the Table Iterator example to discover the tables in your existing database. Next, configure and run the Fixed Iterator, as per the first example to bulk load a selection of the tables.

Conclusion

This article looks at how to replicate an RDS database using Matillion ETL.  All query components and iterators in this example work with Matillion ETL for Amazon Redshift, Snowflake and Google BigQuery. You can find further details on Staging All Tables from an RDS Database, on the support portal. This includes information on how to do incremental updates and to download example jobs.

Useful Links

Fixed Iterator
Table Iterator
Staging All Tables from an RDS Database
RDS Query Component
Integration information
Video

Want to see the Iterator Components in Matillion ETL for Amazon Redshift? Request a demo and why not check out our Optimizing Amazon Redshift eBook