Using the Google Drive Table Component in Matillion ETL for BigQuery
Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behavior analytics, financial analysis, and even reducing the cost of synthesizing DNA.
The Google Drive Table component presents an easy-to-use graphical interface, enabling you to pull data from your files stored on Google Drive directly into BigQuery. Customers are using this to pull user maintained data from their Google Drive into BigQuery, to enhance other data from different sources.
The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.
In this example, we will pull some data from a ‘carriers’ Google Sheet saved on our Google Drive:
New Table Name
This is the name of the new table which Matillion will create in BigQuery to load the contents of the file into:
You have the options to specify whether the top row of data is the header and as such contains the column names for the new table your creating. You can also specify the Cell Range to limit the data that you push through.
Now you will need to give the metadata for the new linked table you are creating.
This is the strategy for Creating the linked table. We recommend you use ‘Replace’ unless you are certain the component will only be run once:
Google Drive URL
This is the file URL of the Google Sheet on Google Drive that you will use to create the table in BigQuery. Right click on your file on Google Drive and click “Get shareable link”.
This will copy the link into the clipboard which you can then easily paste directly into the component:
Select the format of the file on Google Drive. Matillion supports the below formats out of the box:
Number of Errors Allowed
By default Matillion will NOT create the linked table if it finds an error row in the file. However, you can set the maximum number of errors allowed in the file before the component errors
Header Rows to Skip
Set this to avoid loading any headers into the linked table.
Running the Google Drive TableComponent in Matillion ETL for BigQuery
You can run the Orchestration job manually or using the Scheduler. This will query your data once or regularly, respectively, and bring it into BigQuery.
Transforming the Data
Once you have brought the required Google Drive data into BigQuery, you can use it in a Transformation job, perhaps to enhance existing data:
In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of BigQuery’s power and scalability.
For more tips on loading your data into BigQuery download our comprehensive 60+ page guide on Optimizing Google BigQuery.