Last week I represented Matillion at the inaugural Snowflake Summit at Union Square Hilton, San Francisco 3rd to 6th June 2019. I want to thank all the attendees that stopped by our booth to share their challenges on their legacy systems and excitement and successes on the Snowflake platform. As the Research and Development Lead at Matillion I attended the keynotes and technical sessions to see how Snowflake is disrupting the traditional database market and what new innovations they are releasing over the coming months. Throughout the summit, Snowflake delivered an excellent mixture of technical Keynotes announcing their new features and upcoming products with emotive sessions about how Snowflake customers are using the cloud data warehouse to improve their businesses and change lives with data.
From a feature point of view, I’m especially excited about the new Snowflake support for External Tables and the launch of the Snowflake Data Exchange – an app store like service allowing users to access third-party data sets directly in their Snowflake accounts. This has the potential to be a game changer. Snowflake’s innovation and technical depth is pushing data to new limits and is truly revolutionizing the way modern businesses think about and storing and process data.
At the summit, Snowflake announced the new External Tables feature on Tuesday 4th June. External Tables allows customers to access data directly in external stages (e.g. S3 or Azure Blob Storage) without loading it into Snowflake storage first (taking both time and Snowflake storage credits). This is useful for accessing large Data Lakes or querying log files directly, as just two examples of using this feature.
At Matillion we pride ourselves on our deep product alignment. What do we mean by that? Matillion is the only purpose-built data transformation tool for the cloud and more specifically Snowflake. This is why, within Matillion ETL for Snowflake, you will find Snowflake specific components such as Alter Warehouse and Create File Format.
Matillion ETL for Snowflake users can update to the latest version for first-class features for creating External Tables over Snowflake stages and flattening the semi-structured data that is made available by this technology.
Matillion ETL for Snowflake provides a graphical wizard for creating and managing Stages (connection details to external data stores):
As Matillion is built-for-cloud the Stage Management feature can aid in the creation of stages by presenting drop-down lists of S3 buckets and other options – preventing the need to write manual SQL.
Create External Table Component
This new component allows users to create the new External Tables as part of Matillion Orchestration jobs over all types of supported External data:
In future releases, Matillion will allow users to set, along with the standard new table properties:
- Virtual Columns – for when the schema is known (or leave blank for Schema on Read)
- Partition Columns – specifying how (and if) the external stage is partitioned on the external filesystem
- Format – to specify an existing Snowflake File Format object (or the File Format settings can be set directly in the component)
Refresh External Table
In many cases Snowflake’s Auto-Refresh functionality removes the need to manually issue a Refresh command – however, in some cases (such as when Matillion has moved Data into the stage and it needs immediate processing) it makes sense to Manually refresh the External Table. This component allows this to occur as part of a Matillion orchestration job:
Extract Nested Data
This Matillion Transform component will allow users to graphically define the structure of the data contained within an External Table in order to flatten it to a relational style dataset which can then be processed by other Matillion components.
The Autofill button saves the users having to manually define the data structure – using this feature the tool will analyze the incoming Variant field (from the External Table) to present a graphical tree view of the data and allow users to simply click the fields they wish to use downstream:
Snowflake Data Exchange allows Snowflake customers to purchase access to datasets (such as Weather information) created by third parties. Unlike traditional approaches to accessing information from data providers (such as REST APIs or using FTP to download CSV files) Data Exchange allows users to access a provider’s data directly in their Snowflake account. This means that:
- There is no need to configure complex jobs to connect to external systems such as REST or FTP
- Data is always up to date as user’s are hooking directly into the providers data in Snowflake
As Snowflake Data Exchange purchases are exposed as standard tables within a user’s Snowflake account Matillion ETL for Snowflake can be leveraged to create visual Transform jobs allowing a user to quickly and easily join the third party data to the user’s own data tables. Taking this further Matillion ETL for Snowflake can be used to filter, aggregate and cleanse the data as required just as if you were working with traditional tables owned and controlled by the user.
Overall I was very impressed by first Snowflake Summit, both in terms of content and organisation. Snowflake continue to push the boundaries of modern data warehousing in the Cloud and I’m already looking forward to next year’s Snowflake Summit 1st to 4th June 2020 in Las Vegas.