- Blog
- 05.15.2024
- Leveraging AI, Data Fundamentals, Product
Think outside the container: Hands on with Snowpark Container Services and Matillion

Snowpark Container Services (SPCS) allows Snowflake users to run Docker containers directly within the Snowflake ecosystem, bringing speed, efficiency, and complete data sovereignty. This new service supports both small and large generative AI language models that can be tailored to your own specific tasks.
The Matillion Data Productivity Cloud complements Snowflake by offering code-optional interfaces to the Data Cloud. This combination of technologies optimizes data engineering and AI pipeline management on an enterprise scale.
Together, these platforms empower organizations to accelerate data-driven decision-making processes while maintaining robust data governance.
Technical Details of Snowpark Container Services
Import your customized Docker images into your own private image repository, and Snowpark Container Services can run it for you. It can be either as a long-running service - like an interface to a Large Language Model, or a batch job model accessed via a user defined SQL function.
With SPCS, you can deploy a diverse range of language models, from large ones like Llama 3 to smaller models available through the Hugging Face marketplace, for example. The models all run entirely within the secure bounds of your own Snowflake environment.
This setup prioritizes data security and also provides customizable hardware configurations. Users can select GPUs for processing intensive large models or opt for CPUs for less demanding models, making the system adaptable to different needs and budgets.
The pricing structure is transparent and usage-based, providing cost-effective options for varying enterprise demands.
Industry Use Cases for Snowpark Container Services
SPCS is particularly beneficial for sectors that require stringent data privacy and security measures. Industries governed by regulations such as HIPAA in healthcare, Sarbanes-Oxley, and PCI DSS in financial services find assurance in the complete data sovereignty offered by SPCS.
The SPCS architecture ensures that sensitive data does not leave the secure perimeter of the Snowflake environment, which is crucial for government, telecommunications, and energy utilities handling confidential information.
Additionally, the adaptability of Snowpark Container Services allows for efficient handling of tasks like Personally Identifiable Information (PII) detection, which smaller language models can execute exceptionally well, especially after fine-tuning.
This flexibility, coupled with stringent data governance capabilities, positions SPCS as an indispensable tool for modern data-driven industries.
Data Summarization with an LLM in SPCS
I'll bring to life a use case for running a large language model inside SPCS. Imagine you are managing a set of documentation that's full of technical jargon, and is difficult to follow. You need a way to generate short summaries, but the documents contain confidential information, and policy does not permit sending them externally for processing.
A good way to tackle this is with a large language model - such as Meta Llama 3 70B - running in Snowpark Container Services. With a Matillion Snowpark Container Prompt, you can send the document(s) for summarization while keeping them inside Snowflake at all times.
Afterward, you can join the summarized outputs back to the originals to keep the information together. Once again, all this happens inside Snowflake.
Here's how it looks in the Matillion Data Productivity Cloud. The original text is previewed, and the short summary can be seen at the bottom.

Summarizing legal documents with a Matillion Snowpark Container Prompt component
Being a large language model, the entire operation works just using the base functionality of the Llama 3 model, without the need for any additional configurations or customization.
PII detection using a Matillion Snowpark Container Prompt
Now onto another example. This task involves detecting personally identifiable information (PII) within text data. Specifically, it centers on scanning medical notes, and automating the detection of - for example - names and phone numbers using machine learning.
For this kind of task a small to medium-sized language model such as Mixtral 8x7B, housed within an SPCS container, is sufficient. This applies generally to any medium sized model that has been fine tuned for PII detection. It's faster to run than a large language model, and also runs at a cheaper rate.
Data from an input table containing the medical notes is read, analyzed by the language model, and the examination results are stored in a new table. Data never leaves Snowflake at any point.

PII detection with a Matillion Snowpark Container Prompt component
In PII detection, two questions are usually posed to the language model:
- Determine the presence of any PII in the medical notes (Yes/No).
- If PII is detected, extract and list the identifiable details explicitly. This helps make it easier to remove.
After joining the original notes back to the corresponding output entries from the Mixtral service, here's how a sample of test data appears in a Matillion data pipeline.

PII detection in a Matillion data pipeline
Just like the PII detection, the join and the data sampling operate entirely inside Snowflake.
After doing this, we are able to take a range of possible actions like segregation or removal of PII from records. This process is fundamental for maintaining privacy and complying with data protection regulations.
Want to see this in action on video? Check it out here:
Summary
Matillion and Snowpark Container Services provide a secure and flexible platform for data engineers to incorporate AI into data workflows, ensuring data sovereignty within Snowflake’s environment.
This solution supports the use of cutting-edge open-source AI models and offers a choice between CPUs and GPUs for different tasks. With an emphasis on cost-effectiveness, it enables precise budget control by managing Snowflake credit consumption.
Discover more about our joint capabilities at the Snowflake Summit, and get ready to enhance your own data pipelines with AI integration while keeping sensitive data secure!
Ian Funnell
Data Alchemist
Ian Funnell, Data Alchemist at Matillion, curates The Data Geek weekly newsletter and manages the Matillion Exchange.
Follow Ian on LinkedIn: https://www.linkedin.com/in/ianfunnell
Featured Resources
Big Data London 2025: Key Takeaways and Maia Highlights
There’s no doubt about it – Maia dominated at Big Data London. Over the two-day event, word spread quickly about Maia’s ...
BlogSay Hello to Ask Matillion, Your New AI Assistant for Product Answers
We’re excited to introduce a powerful new addition to the Matillion experience: Ask Matillion.
BlogRethinking Data Pipeline Pricing
Discover how value-based data pipeline pricing improves ROI, controls costs, and scales data processing without billing surprises.
Share: