Visit Matillion AI Playground at Snowflake Data Cloud Summit 24

Find out more

Life hacks for Snowflake users

Unlock the full potential of your data with Snowflake Data Cloud - a revolutionary platform designed for unparalleled speed and performance. Snowflake's unique approach, separating compute and storage, empowers you to scale your cloud data warehouse effortlessly, accommodating peak workloads while ensuring swift and cost-effective data queries.

This blog contains life hacks for users new to the Snowflake Data Cloud to help you get the most out of the platform.

Life Hack 1: Manage your Snowflake Virtual Warehouse to minimize costs

You must use a virtual warehouse any time you read and process data, and this is the main consumer of Snowflake credits. Snowflake credits are expended to cover the processing time utilized by individual virtual warehouses, with charges determined by the quantity of virtual warehouses employed, their duration of operation, and their respective sizes.

To conserve credits, it's important that your virtual warehouse is not running when it's not needed. The best way to achieve this is to switch on auto suspend and auto resume. For example, to have a virtual warehouse auto-suspend after 10 minutes (600 seconds) of inactivity, use a SQL client to issue the following statement:

ALTER WAREHOUSE my_vwh_name SET AUTO_SUSPEND = 600;

If you want to avoid waiting 10 minutes, you can immediately suspend any running virtual warehouse:

ALTER WAREHOUSE my_vwh_name SUSPEND;

To avoid processing delays, have your virtual warehouse automatically resume as soon as it is needed:

ALTER WAREHOUSE my_vwh_name SET AUTO_RESUME = TRUE;

These SQL statements are also available through the Data Productivity Cloud's low-code Alter Warehouse component.

Life Hack 2: Tune Warehouse size to balance performance

Snowflake offers a range of virtual warehouse sizes, ranging from XSMALL (costing 1 credit per hour) to 6X-LARGE (costing 512 credits per hour). These correspond to varying amounts of computing power.

Remember that a smaller virtual warehouse might not be the most cost-effective. A larger virtual warehouse incurs higher per-second costs but delivers faster task completion. Similarly, a smaller virtual warehouse offers cost savings at the expense of slower task execution. It is common to deploy many different-sized virtual warehouses within a single account, each optimized for different workloads.

One virtual warehouse can serve many client sessions simultaneously while keeping each session logically isolated. If the warehouse becomes overloaded, additional queries are queued until the necessary resources become available. From the user perspective, the session simply hangs: queries are suspended and may even be canceled if the maximum queue time parameter is exceeded.

Snowflake Enterprise Edition introduces Multi-cluster Warehouses, which enable horizontal scaling by automatically allocating new clusters if demand reaches a threshold. Scaling policies are available to define the upper and lower thresholds for scaling up and down. This feature means more sessions can share a single virtual warehouse without resizing it manually.

Life Hack 3: Leverage Snowflake's processing power for Data Transformation

Snowflake users performing data integration should take advantage of the processing power of Snowflake itself for data transformation. That means performing data extraction and loading first, followed by in-database transformations using SQL.

Leveraging Snowflake for data transformation allows you to process, clean, and enrich your data within the platform. This is a great way to enable efficient and scalable data transformation.

Life Hack 4: Use TRY functions for improved DataOps

One of the main kinds of data transformations that typically need to occur during data integration is changing between data types.

Incoming data that has just been copied from a source system typically has high quality but low consumability. The data is currently being used to run the business, so it's of high quality. But the creators of that data are only interested in operating that one system: not at all concerned with other downstream data users.

The result is that many conversions are needed to make the schema easier to use. The most common examples are flattening semistructured data (typically from VARIANT columns) and converting strings into dates or numbers.

Snowflake has a large number of built-in SQL functions to do these conversions quickly and efficiently - such as CAST, PARSE_JSON, TO_GEOGRAPHY, TO_NUMERIC and TO_DATE.

In DataOps terms, the main drawback is that SQL statements run in batch mode - for example, one million records at a time. If a datatype conversion error occurs, it might be just one single record, which can be tricky to find.

For this reason, Snowflake offers a set of TRY% functions - such as TRY_CAST, TRY_PARSE_JSON, TRY_TO_GEOGRAPHY, TRY_TO_NUMERIC and TRY_TO_DATE. These are functionally equivalent, except that if the conversion fails, they return NULL rather than cause an error.

Using these TRY% functions - for example, within a Calculator component - helps make data transformation pipelines more robust.

Get Started with Matillion Today!

In case you missed it, here is Matillion's real-world guide to Optimizing Snowflake.

To learn more about Matillion and the Data Productivity Cloud, visit www.matillion.com to explore interactive demos, sign up for a 14-day trial, or participate in a Virtual Hands-on Lab, building your own version of the pipelines described above.

Ian Funnell
Ian Funnell

Data Alchemist

Ian Funnell, Data Alchemist at Matillion, curates The Data Geek weekly newsletter and manages the Matillion Exchange.