Using the Matillion ETL Cloud Storage Load Component for BigQuery
Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.
The Cloud Storage Load component in Matillion ETL for BigQuery presents an easy-to-use graphical interface, enabling you to pull data from a JSON file stored in an Cloud Storage Bucket into a table in a BigQuery database. The Cloud Storage Load component in Matillion ETL for BigQuery is a popular feature with our customers. The most noteworthy use case for the component is its ability to load data files into BigQuery that can then be combined with external data sources for analysis and reporting.
The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.
Watch this short tutorial video to see how to use the Cloud Storage Load Component for BigQuery.
For this example, we have a JSON file containing details of different colours. A snippet of the file is below:
The first step in configuring the Cloud Storage Load component is to provide the BigQuery table which the data in the file is to be loaded into. This table should already exist on the BigQuery database and can be selected from the dropdown list:
Next, determine what columns should be loaded. Matillion allows the user to choose specific columns in the table that are loaded with the data. Alternatively, all columns will be loaded if this is left blank.
Google Cloud Storage URL Location
The Google Cloud Storage URL is the location of the file to be loaded in the Google Cloud Storage bucket. Select the folder from the Edit Properties pop up:
The Cloud Storage Load component supports many different data file types including, Cloud Datastore Backup, CSV and JSON. Therefore, select the required file format from the drop down:
Running the Cloud Storage Load component in Matillion ETL for BigQuery
All other properties can be left as the default values. However, we recommend reviewing these, especially the Write Performance and Number of Errors Allowed.
At this point, you can run the Orchestration job to query your data and bring it into BigQuery either manually or by using the Scheduler.
Transforming the Data
Once the required data has been brought into BigQuery from the JSON file it can be transformed. Using a Transformation job you can join to existing data:
In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of BigQuery’s power and scalability.