Using the S3 Load component in Matillion ETL for Snowflake
Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.
The S3 Load component in Matillion ETL for Snowflake presents an easy-to-use graphical interface, enabling you to pull data from a JSON file stored in an S3 Bucket into a table in a Snowflake database. The S3 Load component in Matillion ETL for Snowflake is a popular feature with our customers. The most noteworthy use case for the component is its ability to load data files into Snowflake that you can subsequently combine with external data sources for analysis and reporting. Please note, however, that Snowflake only supports the loading of a JSON file into a single column in a Snowflake table.
The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.
S3 Object Prefix
The S3 Object Prefix is the full file path to the S3 Objects that you want to load into Snowflake. You can use an object prefix to loop through the many files that you are going to load into Snowflake. First, select the file from the Properties box:
The next step in configuring the S3 Load component is for you to provide the Snowflake table. This is the table where you want the data in the S3 file is to be loaded into. This table should already exist on the Snowflake database and can be selected from the dropdown list:
Matillion allows the user to choose specific columns in the table that are loaded with the data. Alternatively, all columns will be loaded if this is left blank.
The S3 Load component supports many different data file types including Avro, Delimited, Fixed Width and JSON file types. Therefore, select the required File Type from the drop down:
Running the S3 Load component in Matillion ETL for Snowflake
You can leave all other properties as the default values. At this point, you can run the Orchestration job to query your data and bring it into Snowflake, either manually or by using the scheduler.
Transforming the Data
Once the required data is in Snowflake from the JSON file, you can use it in a transformation. A Transformation job will join new data with existing data, for example:
In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of Snowflake’s power and scalability.