Using the Excel Query component in Matillion ETL for Amazon Redshift

Using the Excel Query component in Matillion ETL for Amazon Redshift

Excel Query component in Matillion ETL for Amazon RedshiftMatillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.

The Excel Query component in Matillion ETL for Amazon Redshift presents an easy-to-use graphical interface, enabling you to connect to an Excel file stored on an S3 Bucket and pull data into Amazon Redshift. Many of our customers are using this service to enhance their data warehouses by bringing in supplementary user maintained data sources.

The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.

Watch this short tutorial video to see how to use the Excel Query component in Matillion ETL for Amazon Redshift.

 

 

Excel File

To configure the Excel Query component first, provide a link to the Excel File in the S3 bucket to be loaded. Click on the 3 dots next to the Excel File property to see all available S3 buckets in your AWS account. Select an Excel file in a bucket. Please note, a file in a public S3 bucket can be specified here by manually entering the S3 URL:

 

Matillion-ETL-Redshift-ExcelQueryComponent-ExcelFile

 

Matillion-ETL-Redshift-ExcelQueryComponent-S3URL

 

Contains Header Row

If the first row of data in the Excel file is the header, select ‘Yes’ and the header values will become the column names in the new Amazon Redshift table. Selecting ‘No’ will result in the columns being named A, B, C and so on.

Cell Range

If applicable, select a range of cells within the data. Please note, only data within the range will be loaded into Amazon Redshift. Specifying a cell range can be useful if the spreadsheet has additional data that you don’t want users to load into the Amazon Redshift database.

Data Source

Next, select the data source to be loaded into Amazon Redshift from the Data Source drop down. This is a list of the sheets or named ranges available in the Excel document.

 

Matillion-ETL-Redshift-ExcelQueryComponent-DataSource

 

After selecting the data source, choose the required fields from the data source in the Data Selection. This is a list of the columns in the specified Cell Range or available data detected by Matillion. In addition, Matillion can bring through the Excel Row Id. This will form the new table which is created in Amazon Redshift.

 

Matillion-ETL-Redshift-ExcelQueryComponent-EditProperties

 

Connection Options

These are additional parameters supported by the driver. The Excel driver usually provides sensible defaults and therefore, doesn’t mandate the configuration of Connection Options. Find further details on Connections Options in our support documentation.

Running the Excel Query

Before you run the component, give the Target Table a name. This is subsequently the name of the new table created to write the data into Amazon Redshift. Also an S3 Staging Area must be specified. This is an S3 bucket to temporarily store the query results before loading it into Amazon Redshift.

This component also has a Limit property, which can be used to force an upper limit on the number of records returned.

You can run the Orchestration job, either manually or using the Scheduler, to query your data and bring it into Amazon Redshift.

 

Matillion-ETL-Redshift-ExcelQueryComponent-Run

 

Advanced mode

The Excel Query component offers an “Advanced” mode instead of the default “Basic” mode. In Advanced mode, you can write a SQL-like query over all the available fields in the data model. This is automatically translated into the correct API calls to retrieve the data requested.

 

Matillion-ETL-Redshift-ExcelQueryComponent-AdvancedMode

 

Matillion-ETL-Redshift-ExcelQueryComponent-SQLQuery

 

Transforming the Data

Once the required data has been brought into Amazon Redshift from the Excel Spreadsheet, it can then be used in a Transformation job. A noteworthy transformation is to enhance existing data:

 

Matillion-ETL-Redshift-ExcelQueryComponent-Transformation

 

In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of Amazon Redshift’s power and scalability.

Info

Want to try the Excel Query component in Matillion ETL for Amazon Redshift? Arrange a free 1-hour training session now, or start a free 14-day trial.