Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.
The Excel Query component in Matillion ETL for BigQuery presents an easy-to-use graphical interface, enabling you to connect to an Excel file stored in either a Cloud Storage or S3 Bucket and pull data from that file into BigQuery. Many of our customers are using this service to enhance the data in their data warehouse by bringing in some supplementary user maintained data.
The connector is completely self-contained: therefore, no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.
Watch this short tutorial video to see how to use the Excel Query component in Matillion ETL for BigQuery.
Before configuring the Excel Query component in Matillion, you need to configure the Environment in Matillion with AWS credentials to access the required S3 bucket, unless the S3 bucket is public.
Read more about Adding an AWS Access Policy in our support documentation.
The first step is to configure the Excel Query component. To set up the component you’ll need to provide a link to the Excel File in the S3 bucket you want to load.
When you click on the 3 dots next to the Excel File property a pop up box will appear showing all available S3 buckets in your AWS account. Select an Excel file in a bucket. Please note, a file in a public S3 bucket can be specified here by manually entering the S3 URL:
Contains Header Row
If the first row of data in the Excel file is the header, you should select ‘Yes’. This will make the header values the column names in the new BigQuery table. If you select ‘No’ the columns will default to A, B, C, and so on.
If applicable, you can select a range of cells within the data. Please note, only data within the range will be loaded into BigQuery. Specifying a cell range can be useful, for example, if you have additional data in the spreadsheet which you do not want users to load into the BigQuery database.
Next, you will need to choose the data that is to be loaded into BigQuery from the Data Source drop down. This is a list of the sheets or named ranges, if applicable, available in the Excel document.
After selecting the data source, choose the required fields from the data source in the Data Selection. This is a list of the columns in the specified Cell Range specified or available data detected by Matillion. In addition, Matillion can bring through the Excel Row Id. This will form the new table which is created in BigQuery.
These are additional parameters supported by the driver. Connection Options are not mandatory, but the Excel driver offers sensible defaults. Find further details on Connections Options in our support documentation.
Running the Excel Query component in Matillion ETL for BigQuery
Before running the component you must name the Target Table. This is the name of a new table which is created to write the data into BigQuery. Also a Cloud Storage Staging Area must be specified. This is a Cloud Storage bucket used to temporarily store the results of the query before before it’s loaded into BigQuery.
This component also has a Limit property so you can force an upper limit on the number of records returned.
You can run the Orchestration job, either manually or using the Scheduler, to query your data and bring it into BigQuery.
The Excel Query component offers an “Advanced” mode instead of the default “Basic” mode.
In Advanced mode, you can write a SQL-like query over all the available fields in the data model. This is automatically translated into the correct API calls to retrieve the data requested.
Transforming the Data
Finally, once you have loaded the required data into BigQuery from the Excel Spreadsheet, the data becomes available for use in a Transformation job. You may perform a transformation, for example, to enhance existing data:
In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of BigQuery’s power and scalability.