At Matillion, we design purpose-built ETL products for each cloud data warehouse we support. Our method provides users with a seamless experience, allowing them to utilize cloud data warehouse functionality from within Matillion ETL. For this reason, we are always updating our products to reflect the advancements made in cloud data warehouse technology. In Matillion ETL for Snowflake version 1.47, we have introduced a number of improvements and enhancements for our Snowflake users.
What’s new and improved in Matillion ETL for Snowflake?
File pattern matching for external tables
The CREATE [OR REPLACE] EXTERNAL TABLE statement within Snowflake has been enhanced to include a PATTERN parameter that allows users to specify files to be matched on the external stage using regular expression pattern syntax.
For example, using a pattern ‘.*flight.*[.]csv’ will load any files with the extension .csv that contain the text ‘flight’ anywhere within the name section of the filename.
Set the default DDL collation for a table
The recently-released DEFAULT_DDL_COLLATION parameter can be used to set a default collation specification for new table columns that do not have an explicit collation specification.
Any currently supported collation specifiers can be provided as values for this parameter.
Load semi-structured data directly into target tables based on column name matches
This release introduces support for loading semi-structured data directly into columns in the target table that match columns represented in the input data file. This feature is implemented through a new MATCH_BY_COLUMN_NAME copy option. This copy option enables loading columns represented in semi-structured data into separate columns in a target table.
New aggregate function – skew
This newly added function returns the sample skewness of non-NULL records. In basic terms, this describes how asymmetric the underlying distribution is.
Display source jobs under folders in Migration Wizard
In previous versions of Matillion ETL for Snowflake, if a project is set up to use folders in order to better organize components, we did not make use of these folders when migrating jobs in the Migrate wizard. All jobs were shown as one single list of job names.
With this release, we now show jobs listed under the appropriate folders so it’s easier to navigate, as well as select and deselect jobs of interest.
If the project is set up to use the following structure,
then the Migration Wizard will display the job selection as follows:
Support for unloading to Parquet file format with LZO compression
The location based COPY INTO command now supports Parquet files with LZO(.lzo) compression in either an internal (i.e. Snowflake) or external location.
With the implementation of this feature, the parameter SNAPPY_COMPRESSION has been deprecated and replaced with the COMPRESSION parameter and associated values.
The original SNAPPY_COMPRESSION=TRUE is now superseded by the COMPRESSION=SNAPPY option (this is also implied by the selection of AUTO for the compression type).
Converting null specifiers and trimming white space
The table-based COPY INTO command now offers support for the NULL_IF and TRIM_SPACE file format options when loading string values from semi-structured data into separate columns in relational tables. This feature was previously only supported for CSV format data but is now extended to support JSON, Avro, ORC and Parquet file types.
The data entries representing a null value are entered, one per line, through the NULL If parameter:
Note the blank entry as the second identifier. This will identify empty strings within the incoming data.
The TRIM_SPACE parameter is set either to True (remove trailing spaces) or False (retain trailing spaces).
New window calculation function – Kurtosis
This newly added function returns the population excess Kurtosis of non-NULL records. The inclusion of this function allows a determination of how the data varies from a normal distribution.
Loading CSV data replacing invalid characters
The table-based COPY INTO command now offers support for loading CSV files even if the data contains invalid UTF-8 characters. To support this functionality we now provide an additional parameter, Replace Invalid Characters that can be set to True (replace invalid characters with the Unicode replacement character) or False (generate an error when invalid UTF-8 characters are detected in the data).
Loading CSV data ignoring blank records
The table-based COPY INTO command now offers support for loading CSV files even if the data contains blank records. To support this functionality, we now provide an additional parameter, Skip Blank Lines, that can be set to True (ignore blank records) or False (generate an end of record error).
Support for appended rows in streams
The CREATE STREAM component now includes a parameter, Append Only, to specify the stream as append only. When set to True, the stream will track row inserts into a table but will ignore updates and deletes. This can provide a performance improvement over the standard behaviour of the stream with Append Only set to the default value of False.
Learn more about Matillion ETL v1.47
Help shape future versions of Matillion ETL in the Matillion Community
Do you have an idea for the Matillion ETL for Snowflake product roadmap? Head over to the Community and submit your feature request to the Ideas Portal.