Blog| Matillion ETL for BigQuery

    Using the Google Drive Table Component in Matillion ETL for BigQuery

    Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behavior analytics, financial analysis, and even reducing the cost of synthesizing DNA.

    The Google Drive Table component presents an easy-to-use graphical interface, enabling you to pull data from your files stored on Google Drive directly into BigQuery. Customers are using this to pull user maintained data from their Google Drive into BigQuery, to enhance other data from different sources.

    The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.

    In this example, we will pull some data from a ‘carriers’ Google Sheet saved on our Google Drive:

     

    Google Drive Table Component in Matillion ETL for BigQuery - gsheet

     

    New Table Name

    This is the name of the new table which Matillion will create in BigQuery to load the contents of the file into:

     

    Google Drive Table Component in Matillion ETL for BigQuery - New table

     

    You have the options to specify whether the top row of data is the header and as such contains the column names for the new table your creating. You can also specify the Cell Range to limit the data that you push through.

    Table Metadata

    Now you will need to give the metadata for the new linked table you are creating.

     

    Google Drive Table Component in Matillion ETL for BigQuery - Metadata

     

    Create/Replace

    This is the strategy for Creating the linked table. We recommend you use ‘Replace’ unless you are certain the component will only be run once:

     

    Google Drive Table Component in Matillion ETL for BigQuery - create replace

     

    Google Drive URL

    This is the file URL of the Google Sheet on Google Drive that you will use to create the table in BigQuery. Right click on your file on Google Drive and click “Get shareable link”.

     

    Google Drive Table Component in Matillion ETL for BigQuery - google drive url

     

    This will copy the link into the clipboard which you can then easily paste directly into the component:

     

    Google Drive Table Component in Matillion ETL for BigQuery - edit properties

     

    File Format

    Select the format of the file on Google Drive. Matillion supports the below formats out of the box:

     

    Google Drive Table Component in Matillion ETL for BigQuery - file format

     

    Number of Errors Allowed

    By default Matillion will NOT create the linked table if it finds an error row in the file. However, you can set the maximum number of errors allowed in the file before the component errors

    Header Rows to Skip

    Set this to avoid loading any headers into the linked table.

    Running the Google Drive TableComponent in Matillion ETL for BigQuery

    You can run the Orchestration job manually or using the Scheduler. This will query your data once or regularly, respectively, and bring it into BigQuery.

     

    Google Drive Table Component in Matillion ETL for BigQuery - run component

     

    Transforming the Data

    Once you have brought the required Google Drive data into BigQuery, you can use it in a Transformation job, perhaps to enhance existing data:

     

    Google Drive Table Component in Matillion ETL for BigQuery - transformation

     

    In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of BigQuery’s power and scalability.

    Useful Resources

    Google Drive Table component

     

    For more tips on loading your data into BigQuery download our comprehensive 60+ page guide on Optimizing Google BigQuery.

    Matillion-Optimizing Google BigQuery - Email banner