Using the BigQuery Query Component in Matillion ETL for Amazon Redshift
Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.
The Google BigQuery component presents an easy-to-use graphical interface, enabling you to connect to Google BigQuery and pull tables from there into Amazon Redshift. Many of our customers are using this service to bring BigQuery data into Amazon Redshift to combine with other data.
The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.
The first step in configuring the Google BigQuery component is to provide the Authentication to BigQuery. The Matillion Google BigQuery component requires OAuth to be setup to authenticate Matillion to connect to BigQuery data. Further details of configuring BigQuery OAuth is available on our support center. Clicking on the 3 dots next to the Authentication property will bring a pop up box showing all available Google OAuth set up in Matillion:
Project ID and Dataset ID
Next give the Google BigQuery Project and Dataset Ids of the projects and datasets which hold your data. These are available from the BigQuery web UI:
Now choose what data you want to load into Amazon Redshift from the Data Source drop down. This is a list of the table available in your BigQuery dataset:
After choosing the data source, next choose the required fields from the data source in the Data Selection. This is a list of the columns available in the Data Source you have selected. This will form the new table to be created in Amazon Redshift.
Data Source Filter
Additionally, you can add a filter if required. This will filter the returned data, based on the specifications you give the component. For example, this filter will run as the WHERE clause in BigQuery:
Running the BigQuery Query component in Matillion ETL for Amazon Redshift
Before you can run the component you need to specify a Target Table name. This is the name of a new table that Matillion will created to write the data into in Amazon Redshift. Also a S3 Staging Area must be specified, this is a S3 bucket which is used to temporarily store the results of the query before it is loaded into Amazon Redshift.
This component also has a Limit property which forces an upper limit on the number of records returned.
You can run the Orchestration job, either manually or using the Scheduler, to query your data and bring it into Amazon Redshift.
The Google BigQuery component offers an “Advanced” mode instead of the default “Basic” mode.
In Advanced mode, you can write a SQL query over all the available tables in BigQuery in either Legacy or Standard SQL. Matillion then automatically translates SQL into the correct API calls to retrieve the data requested.
Transforming the Data
Once you have the required BigQuery data in Amazon Redshift, you can then use it in a Transformation job, perhaps to combine with other data:
In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of Amazon Redshift’s power and scalability.
Want to try the BigQuery Query component in Matillion ETL for Amazon Redshift? Arrange a free demo, or start a free 14-day trial.