Visit Matillion AI Playground at Snowflake Data Cloud Summit 24

Find out more

Using the RDS Query Component in Matillion ETL for Snowflake to load data from PostgreSQL

Database 768x483

Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.

The RDS Query component in Matillion ETL for Snowflake presents an easy-to-use graphical interface, enabling you to pull data from a PostgreSQL database and load it into Snowflake. Many of our customers are using this component to get all their data in one place so they can then combine this data with additional data for further analysis.

The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.

Video

Watch our tutorial video for a demonstration on how to set up and use the RDS Query Component in Matillion ETL for Snowflake to load data from PostgreSQL.

 

Data Extraction

This component runs a SQL Query on an RDS database and copies the result to a table, via a S3 staging bucket.This component is for data-staging – getting data into a table in order to perform further processing and transformations on it in Snowflake. You should consider the target table temporary, as it will either truncate or recreate each time the components runs.

 

RDS Query Component in Matillion ETL for Snowflake - PostgreSQL - Data Extraction

 

Above is a typical workflow for importing data from a RDS database into Snowflake.

  1. RDS connects and executes SQL against the RDS.
  2. The component reads the data from PostgreSQL and streams it to a S3 Staging bucket. (Data is never written to disk).
  3. Matillion copies the data from S3 Stage buckets to to Snowflake using “Copy” command.

The RDS Server database is hosted within your VPC AWS. Your Matillion instance can reach the service by RDS endpoint URL available in your AWS console.

Authentication

Before you can use the RDS Query component there are a couple of prerequisites:

  1. The RDS Server is accessible from the Matillion instance.
  2. You have provisioned your Snowflake data warehouse and it is running on AWS platform.

Using the RDS Query Component in Matillion ETL for Snowflake to load data from PostgreSQL

The RDS Query Component can be found under the “Load/Unload” folder in the Components panel. The following image shows the properties for a configured component.

 

RDS Query Component in Matillion ETL for Snowflake - PostgreSQL - Properties

 

  • The options ‘Server’, ‘User and Password’, ‘Connection Options’ cover the details required to establish the connection to your RDS Server.
  • You will also need to specify a ‘S3 Staging Area’ which is the S3 bucket that will hold the results prior to loading into Snowflake.
  • For the ‘Schema’, you can identify the staging table on Snowflake you want to load the data into. Each time you run the component, the table will drop and rebuild. You should therefore treat this table as a Staging table and subsequently move the data into a Persistent table as soon as possible.

This RDS Query component retrieves data on a RDS and loads it into a Snowflake table. You will use a S3 bucket to stage the data. So each time you run the component, the table will truncate and reload. You may then use transformation job to enrich and manage the data in permanent tables.

Warning: This component is destructive as it truncates or recreates its target table on each run. Do not modify the target table structure manually.

Connection Options

The component has additional JDBC parameters supported by the Database Driver. The driver automatically determines the available parameters, and may change from version to version. For example, you can set SSL to enforce communication over a secure channel. See here for more JDBC options. None of the connection options are mandatory with the RDS driver, as Matillion ETL usually provides sensible defaults.

Advanced Mode

Finally, you can use Advanced Mode to issue SQL Query against the RDS. You can therefore replace any variable definitions with the appropriate values and then issue the query against your PostgreSQL database.

Orchestration Job

The following image shows a orchestration job that pulls data from a RDS Server into a target table in Snowflake using SQL Query and then copied the data to persistent table using the transformation jobs.

 

RDS Query Component in Matillion ETL for Snowflake - PostgreSQL - Orchestration

 

Conclusion

In conclusion, we hope that gave you a good overview of the RDS Query component in Matillion ETL for Snowflake. Give it a go and let us know ([email protected]) if you need any support with this component.

Useful Links

RDS Query Component in Matillion ETL for Snowflake 
PostgreSQL JDBC Driver
Video

 

The post Using the RDS Query Component in Matillion ETL for Snowflake to load data from PostgreSQL appeared first on Matillion.