Using the Excel Query component in Matillion ETL for Amazon Redshift
Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.
The Excel Query component in Matillion ETL for Amazon Redshift presents an easy-to-use graphical interface, enabling you to connect to an Excel file stored on an S3 Bucket and pull data into Amazon Redshift. Many of our customers are using this service to enhance their data warehouses by bringing in supplementary user maintained data sources.
The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.
Watch our tutorial video for a demonstration on how to set up and use the Excel Query component in Matillion ETL for Amazon Redshift.
To configure the Excel Query component first, provide a link to the Excel File in the S3 bucket to be loaded. Click on the 3 dots next to the Excel File property to see all available S3 buckets in your AWS account. Select an Excel file in a bucket. Please note, a file in a public S3 bucket can be specified here by manually entering the S3 URL:
Contains Header Row
If the first row of data in the Excel file is the header, select ‘Yes’ and the header values will become the column names in the new Amazon Redshift table. Selecting ‘No’ will result in the columns being named A, B, C and so on.
If applicable, select a range of cells within the data. Please note, only data within the range will be loaded into Amazon Redshift. Specifying a cell range can be useful if the spreadsheet has additional data that you don’t want users to load into the Amazon Redshift database.
Next, select the data source to be loaded into Amazon Redshift from the Data Source drop down. This is a list of the sheets or named ranges available in the Excel document.
After selecting the data source, choose the required fields from the data source in the Data Selection. This is a list of the columns in the specified Cell Range or available data detected by Matillion. In addition, Matillion can bring through the Excel Row Id. This will form the new table which is created in Amazon Redshift.
These are additional parameters supported by the driver. The Excel driver usually provides sensible defaults and therefore, doesn’t mandate the configuration of Connection Options. Find further details on Connections Options in our support documentation.
Running the Excel Query
Before you run the component, give the Target Table a name. This is subsequently the name of the new table created to write the data into Amazon Redshift. Also an S3 Staging Area must be specified. This is an S3 bucket to temporarily store the query results before loading it into Amazon Redshift.
This component also has a Limit property, which can be used to force an upper limit on the number of records returned.
You can run the Orchestration job, either manually or using the Scheduler, to query your data and bring it into Amazon Redshift.
The Excel Query component offers an “Advanced” mode instead of the default “Basic” mode. In Advanced mode, you can write a SQL-like query over all the available fields in the data model. This is automatically translated into the correct API calls to retrieve the data requested.
Transforming the Data
Once the required data has been brought into Amazon Redshift from the Excel Spreadsheet, it can then be used in a Transformation job. A noteworthy transformation is to enhance existing data:
In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of Amazon Redshift’s power and scalability.
Want to try the Excel Query component in Matillion ETL for Amazon Redshift? Arrange a free 1-hour training session now, or start a free 14-day trial.