Snowflake External Tables allow you to query data stored in cloud storage — such as Amazon S3, Azure Blob Storage, or Google Cloud Storage — without having to load it into Snowflake first. These tables act as a virtual layer referencing external files, making them ideal when you want to keep data in its original location while still taking advantage of Snowflake's powerful query engine.
Using External Data with Matillion's Data Productivity Cloud
Within the Data Productivity Cloud, we can build and make use of Custom Connectors to make custom API calls, and load the data into Snowflake by referencing the custom connector in a pipeline in Designer. There are a few options on how to interact with the responses from the API calls that the Custom Connector will make and receive:
We can load the response(s) directly into a table in Snowflake without the need for any cloud storage. This is the typical use for Custom Connector components in Designer
We can take the responses and upload them to cloud storage such as an S3 bucket or Blob containers
These options are controlled by the ‘Destination’ properties on the Custom Connector component:
After some researching and thinking, I thought it may be a cool idea to take the responses from the API calls I am making with the Custom Connector and upload them to an S3 bucket in AWS and make use of external tables in Snowflake to read the response files in the S3 bucket and populate a table in Snowflake that way.
To do this, there are a few prerequisite steps that need to be taken before we can make full use of this scenario:
We need to have a storage integration in Snowflake created, so there is permissible interaction between Snowflake and the S3 bucket being referenced in the Custom Connector component in Designer
We also need to create an external stage in Snowflake that will leverage the storage integration and ultimately be used by the External table to populate table data
Once these prerequisite steps are completed, we can leverage Designer and the SQL Script orchestration component to create the external table in Snowflake, which will read the responses uploaded from the Custom Connector and populate our external table. Overtime as more response files are uploaded to cloud storage, we can build in a mechanism to refresh the external table to contain the new response files, or even explore refreshing the table automatically.
Overall, as a simple design pattern, this would look like:
Use the SQL Script to create the external table (recommended to use the parameter of IF NOT EXISTS in the create table script as this will keep the script future-proof for further executions)
Make API Calls with the Custom Connector component and configure it to upload response files to cloud storage
Use the SQL Script component to refresh the external table so it will include any new response files uploaded from the Custom Connector component
SQL code to create a Snowflake Storage Integration and External Stage
Data does not have to be loaded into Snowflake, and external tables can populate table data from a cloud storage location
External tables can be automatically refreshed so when new files land in cloud storage, the table data reflects new data
External tables support query and join operations
Views can be created from external tables
Cons:
External tables are read-only, so no DML (Data manipulation language) operations can occur against External tables
The creation of external tables and required dependencies (stages, storage integrations) can require some technical familiarity with Snowflake to configure and set up.
Data files can build up in cloud storage over time, as external tables are populated directly by files stored in cloud storage, so they must be present
There can be a few different variations of setting up External tables, such as the creation of storage integrations and stages (technically, external tables can be created and populate data without the use of a storage integration and only with the use of a stage, but the stage will need to be set up slightly differently)
Conclusion:
As we can see, there are a few ways to interact with and populate tables in Snowflake from API responses made by Custom Connectors in Designer.
Matillion is the data pipeline platform that empowers data teams to build and manage pipelines faster for AI and Analytics at scale. Its flexibility is exemplified by its capability to interact with and populate tables in Snowflake in a number of different ways. Matillion's versatile UI, pre-built components, and code options including SQL, Python, and DBT, allow you to seamlessly integrate external tables, helping to support all your AI and analytics workloads.
External tables in Snowflake rely on external stages and file formats to define where the data is stored and how it should be interpreted. Once configured, Snowflake can run SQL queries over the data in-place, ideal for data lake architectures, data ingestion pipelines, or exploratory analytics workflows.
Snowflake External Tables support a variety of file formats including JSON, Parquet, ORC, and CSV. These formats allow you to work with both structured and semi-structured data stored in cloud object storage, making them well-suited for modern data lake environments.
You should consider using an External Table when you want to analyze data without loading it into Snowflake, especially if the data already exists in cloud storage. This approach is useful when working with raw or infrequently accessed datasets, or when you want to minimize storage costs by avoiding duplication.
Yes, Matillion's Data Productivity Cloud makes it easy to integrate with Snowflake External Tables. You can use Custom Connectors to call APIs and write responses to cloud storage, then reference those files in Snowflake through External Tables. Matillion pipelines can automate this process, helping you streamline data ingestion and transformation.
External Tables typically offer slower performance compared to native Snowflake tables because the data is being read directly from cloud storage. However, they are highly efficient for large, read-heavy workloads where loading data into the warehouse would be unnecessary or expensive.
Yes, External Tables can be queried using ANSI SQL in Snowflake. You can join them with other tables, apply filters and aggregations, and use them in views or downstream transformations just like any other table type.
Snowflake does support partitioning for External Tables. You can define partition columns that correspond to folder structures in your cloud storage, which helps Snowflake optimize performance by only scanning relevant partitions during a query.
Yes, Snowflake External Tables are built with security in mind. They inherit security controls from your cloud storage provider, support encrypted data access, and use secure authentication methods like IAM roles or access tokens. When used with Matillion, you can maintain secure, governed pipelines from source to storage to Snowflake.
Yes, Snowflake External Tables are built with security in mind. They inherit security controls from your cloud storage provider, support encrypted data access, and use secure authentication methods like IAM roles or access tokens. When used with Matillion, you can maintain secure, governed pipelines from source to storage to Snowflake.
Share: