Videos

    Using the Excel Query component in Matillion ETL for Amazon Redshift

    Excel Query component in Matillion ETL for Amazon RedshiftMatillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.

    The Excel Query component in Matillion ETL for Amazon Redshift presents an easy-to-use graphical interface, enabling you to connect to an Excel file stored on an S3 Bucket and pull data into Amazon Redshift. Many of our customers are using this service to enhance their data warehouses by bringing in supplementary user maintained data sources.

    The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.

    Video

    Watch our tutorial video for a demonstration on how to set up and use the Excel Query component in Matillion ETL for Amazon Redshift.

     

     

    Excel File

    To configure the Excel Query component first, provide a link to the Excel File in the S3 bucket to be loaded. Click on the 3 dots next to the Excel File property to see all available S3 buckets in your AWS account. Select an Excel file in a bucket. Please note, a file in a public S3 bucket can be specified here by manually entering the S3 URL:

     

    Matillion-ETL-Redshift-ExcelQueryComponent-ExcelFile

     

    Matillion-ETL-Redshift-ExcelQueryComponent-S3URL

     

    Contains Header Row

    If the first row of data in the Excel file is the header, select ‘Yes’ and the header values will become the column names in the new Amazon Redshift table. Selecting ‘No’ will result in the columns being named A, B, C and so on.

    Cell Range

    If applicable, select a range of cells within the data. Please note, only data within the range will be loaded into Amazon Redshift. Specifying a cell range can be useful if the spreadsheet has additional data that you don’t want users to load into the Amazon Redshift database.

    Data Source

    Next, select the data source to be loaded into Amazon Redshift from the Data Source drop down. This is a list of the sheets or named ranges available in the Excel document.

     

    Matillion-ETL-Redshift-ExcelQueryComponent-DataSource

     

    After selecting the data source, choose the required fields from the data source in the Data Selection. This is a list of the columns in the specified Cell Range or available data detected by Matillion. In addition, Matillion can bring through the Excel Row Id. This will form the new table which is created in Amazon Redshift.

     

    Matillion-ETL-Redshift-ExcelQueryComponent-EditProperties

     

    Connection Options

    These are additional parameters supported by the driver. The Excel driver usually provides sensible defaults and therefore, doesn’t mandate the configuration of Connection Options. Find further details on Connections Options in our support documentation.

    Running the Excel Query

    Before you run the component, give the Target Table a name. This is subsequently the name of the new table created to write the data into Amazon Redshift. Also an S3 Staging Area must be specified. This is an S3 bucket to temporarily store the query results before loading it into Amazon Redshift.

    This component also has a Limit property, which can be used to force an upper limit on the number of records returned.

    You can run the Orchestration job, either manually or using the Scheduler, to query your data and bring it into Amazon Redshift.

     

    Matillion-ETL-Redshift-ExcelQueryComponent-Run

     

    Advanced mode

    The Excel Query component offers an “Advanced” mode instead of the default “Basic” mode. In Advanced mode, you can write a SQL-like query over all the available fields in the data model. This is automatically translated into the correct API calls to retrieve the data requested.

     

    Matillion-ETL-Redshift-ExcelQueryComponent-AdvancedMode

     

    Matillion-ETL-Redshift-ExcelQueryComponent-SQLQuery

     

    Transforming the Data

    Once the required data has been brought into Amazon Redshift from the Excel Spreadsheet, it can then be used in a Transformation job. A noteworthy transformation is to enhance existing data:

     

    Matillion-ETL-Redshift-ExcelQueryComponent-Transformation

     

    In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of Amazon Redshift’s power and scalability.

    Useful Links

    Excel Query component in Matillion ETL for Amazon Redshift
    Connection Options
    Integration information
    Video

    Want to try the Excel Query component in Matillion ETL for Amazon Redshift? Arrange a free 1-hour training session now, or start a free 14-day trial.