BigQuery Query Component In Matillion ETL For Amazon Redshift

Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.

 

The Google BigQuery component presents an easy-to-use graphical interface, enabling you to connect to Google BigQuery and pull tables from there into Amazon Redshift. Many of our customers are using this service to bring BigQuery data into Amazon Redshift to combine with other data.

The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.

Authentication

The first step in configuring the Google BigQuery component is to provide the Authentication to BigQuery. The Matillion Google BigQuery component requires OAuth to be setup to authenticate Matillion to connect to BigQuery data. Further details of configuring BigQuery OAuth is available on our support center. Clicking on the 3 dots next to the Authentication property will bring a pop up box showing all available Google OAuth set up in Matillion:

BigQuery Query Component in Matillion ETL for Amazon Redshift - Authentication

Project ID and Dataset ID

Next give the Google BigQuery Project and Dataset Ids of the projects and datasets which hold your data. These are available from the BigQuery web UI:

BigQuery Query Component in Matillion ETL for Amazon Redshift - Project ID

Data Source

Now choose  what data you want to load into Amazon Redshift from the Data Source drop down. This is a list of the table available in your BigQuery dataset:

BigQuery Query Component in Matillion ETL for Amazon Redshift - Data Source

Data Selection

After choosing the data source, next choose the required fields from the data source in the Data Selection. This is a list of the columns available in the Data Source you have selected. This will form the new table to be created in Amazon Redshift.

BigQuery Query Component in Matillion ETL for Amazon Redshift - Data Selection

Data Source Filter

Additionally, you can add a filter if required. This will filter the returned data, based on the specifications you give the component. For example, this filter will run as the WHERE clause in BigQuery:

BigQuery Query Component in Matillion ETL for Amazon Redshift - Filter

Running the BigQuery Query component in Matillion ETL for Amazon Redshift

Before you can run the component you need to specify a Target Table name. This is the name of a new table that Matillion will created to write the data into in Amazon Redshift. Also a S3 Staging Area must be specified, this is a S3 bucket which is used to temporarily store the results of the query before it is loaded into Amazon Redshift.

This component also has a Limit property which forces an upper limit on the number of records returned.

You can run the Orchestration job, either manually or using the Scheduler, to query your data and bring it into Amazon Redshift.

BigQuery Query Component in Matillion ETL for Amazon Redshift - Running Component

Advanced mode

The Google BigQuery component offers an “Advanced” mode instead of the default “Basic” mode.

BigQuery Query Component in Matillion ETL for Amazon Redshift - Advanced Mode

In Advanced mode, you can write a SQL query over all the available tables in BigQuery in either Legacy or Standard SQL. Matillion then automatically translates SQL into the correct API calls to retrieve the data requested.

BigQuery Query Component in Matillion ETL for Amazon Redshift - SQL

Transforming the Data

Once you have the required BigQuery data in Amazon Redshift, you can then use it in a Transformation job, perhaps to combine with other data:

BigQuery Query Component in Matillion ETL for Amazon Redshift - Transformation

In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of Amazon Redshift’s power and scalability.

Useful Links

BigQuery Query Component in Matillion ETL for Amazon Redshift
Component Data Model
Google 3rd Party OAuth Setup