Using the Excel Query component in Matillion ETL for Snowflake
Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.
The Excel Query component in Matillion ETL for Snowflake presents an easy-to-use graphical interface, enabling you to connect to an Excel file stored on an S3 Bucket and pull data into Snowflake. Many of our customers are using this service to enhance their data warehouse by bringing in supplementary user maintained data.
The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.
Watch our tutorial video for a demonstration on how to set up and use the Excel Query component in Matillion ETL for Snowflake.
To configure the Excel Query component first, provide a link to the Excel File in the S3 bucket to be loaded. Click on the 3 dots next to the Excel File property to see all available S3 buckets in your AWS account. Select an Excel file in a bucket. Please note, a file in a public S3 bucket can be specified here by manually entering the S3 URL:
Contains Header Row
If the first row of data in the Excel file is the header, select ‘Yes’ and the header values will be the column names in the new Snowflake table. Selecting ‘No’ will result in the columns being named A, B, C etc.
If applicable, select a range of cells within the data. Please note, only data within the range will be loaded into Snowflake. Specifying a cell range can be useful if you have additional data in the spreadsheet which you do not want users to load into the Snowflake database.
Next, select the data to be loaded into Snowlfake from the Data Source drop down. This is a list of the sheets or named ranges available in the Excel document.
After selecting the data source, choose the required fields from the data source in the Data Selection. This is a list of the columns in the specified Cell Range specified or available data detected by Matillion. In addition, Matillion can bring through the Excel Row Id. This will form the new table which is created in Snowflake.
These are additional parameters supported by the driver. Connection Options are not mandatory, but the Excel driver offers sensible defaults. Find further details on Connections Options in our support documentation.
Running the Excel Query component in Matillion ETL for Snowflake
Before running the component you must name the Target Table. This is the name of a new table which will be created to write the data into Snowflake. Also an S3 Staging Area must be specified. This is an S3 bucket to temporarily store the query results before it is loaded into Snowflake.
This component also has a Limit property which can be used to force an upper limit on the number of records returned.
You can run the Orchestration job to query your data and bring it into Snowflake either manually or using the Scheduler.
The Excel Query component offers an “Advanced” mode instead of the default “Basic” mode. In Advanced mode, you can write a SQL-like query over all the available fields in the data model. This is automatically translated into the correct API calls to retrieve the data requested.
Transforming the Data
Once the required data has been brought from the Excel Spreadsheet into Snowflake, it can then be used in a Transformation job, perhaps to enhance existing data:
In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of Snowflake’s power and scalability.
Want to try the Excel Query component in Matillion ETL for Snowflake? Arrange a free 1-hour training session now, or start a free 14-day trial.