Integrate data from InterSystems Caché to Snowflake using Matillion

Our InterSystems Caché to Snowflake connector efficiently transfers your data to Snowflake within minutes, ensuring it remains current—without the need for manual coding or managing complicated ETL processes.

InterSystems Caché
Snowflake
InterSystems Caché to Snowflake banner

Extracting data from InterSystems Caché to Snowflake

Extracting data from InterSystems Caché is an essential step for organizations seeking to leverage advanced analytics and cloud data warehousing platforms such as Snowflake. Seamlessly moving data from the robust Caché database into Snowflake ensures that critical business information is available for analysis, reporting, and decision-making. This article provides a concise, practical guide to facilitating this data transfer. We will begin by outlining how to create an appropriate user identity in InterSystems Caché to ensure secure and authorized access. For organizations using Matillion to orchestrate data workflows, we will discuss how to check for the necessary JDBC driver or acquire it if needed. The article will also address how to establish reliable network connectivity between your Caché system (the source) and Snowflake (the target), a crucial step for uninterrupted data transfer. Finally, we will cover how to query data from Caché—providing guidance on both initial data loads and strategies for ongoing, incremental data extraction. By following these steps, readers will be equipped to efficiently migrate their data from InterSystems Caché into Snowflake, laying the foundation for scalable analytics and business intelligence.


What is InterSystems Caché?

InterSystems Caché is a high-performance, multi-model database ideal for applications needing massive scalability, reliability, and advanced data management. It supports key-value, object, and relational models within one engine, ensuring flexibility without losing speed or consistency. Features include in-memory caching, ACID transactions, and rapid concurrent processing. Caché offers broad interoperability—supporting ODBC, JDBC, JPA, REST, and SOAP—for seamless integration with modern frameworks. Its multi-threaded architecture efficiently handles both transactional and analytic workloads. Widely used in healthcare, finance, and logistics, Caché is trusted for robust, rapid, and reliable data storage and access in critical environments.

matillion logo x InterSystems Caché

What is Snowflake?

Snowflake is a fully managed cloud data platform designed for modern data warehousing. Its multi-cluster, shared-data architecture separates compute and storage, allowing independent, seamless scaling and cost optimization. Snowflake supports structured and semi-structured data (e.g., JSON, Avro, Parquet), with ANSI SQL compatibility for straightforward analytics integration. Key features include time travel, zero-copy cloning, and secure data sharing across organizations. Available on AWS, Azure, and Google Cloud, Snowflake eliminates traditional maintenance overhead and simplifies data management.

Why Move Data from InterSystems Caché into Snowflake

Unlocking Analytical Power: The Benefits of Copying Data from InterSystems Caché to Snowflake

A data engineer or architect may find significant benefits in copying data from InterSystems Caché into Snowflake. InterSystems Caché often serves as the transactional backbone of core business applications and contains high-value data that is critical for analytical and operational decision-making. However, the true potential of this data is frequently realized only when it is integrated with information from other sources, enabling comprehensive analysis and more informed insights. By migrating or synchronizing this data into Snowflake—a modern, scalable data platform—organizations can efficiently perform large-scale analytics and data integration without burdening the transactional systems of InterSystems Caché. This separation of analytical and operational workloads helps maintain the performance and stability of core business systems, while leveraging the advanced features and flexibility of Snowflake to drive data-driven innovation.

Creating a User in InterSystems Caché Database

This guide explains how to create a user (identity) in InterSystems Caché using both the Management Portal and the SQL shell.


1. Using the Management Portal

  1. Log in to the Caché Management Portal (typically at
    http://<server>:57772/csp/sys/UtilHome.csp
    ).
  2. Navigate to System Administration > Security > Users.
  3. Click Create New User.
  4. Enter the following required details:
  5. User Name: (e.g.,
    myuser
    )
  6. Full Name and/or Description (optional)
  7. Password: Enter and confirm the password.
  8. Assign roles to the user as required.
  9. Click Save to create the user.

2. Using SQL Scripting

While InterSystems Caché security users are not stored as database SQL users (as in traditional RDBMS), you can operate on user records using embedded SQL or programmatically via the

%SYS
namespace and associated classes.

Here's an example using SQL to add a user, via the management tables:

INSERT INTO %SYS.USER
    (Name, PasswordHash, Enabled)
VALUES
    ('myuser', '<password_hash>', 1);

Note: The

PasswordHash
is not the plaintext password. To set the password in practice, use the ObjectScript API as shown below.


3. Using ObjectScript

To create a user and assign a password, use an ObjectScript routine in the %SYS namespace:

```objectscript // Open the Terminal in the %SYS namespace

Do ##class(Security.Users).Create("myuser", "mypassword") ```

To assign roles or enable the user, invoke additional methods:

objectscript
Set sc = ##class(Security.Users).Modify("myuser", , , , , , , , "YourRole")
- Replace
"YourRole"
with the desired role, e.g.,
"%%ALL"
. Multiple roles can be assigned as a comma-separated list.


Additional Notes

  • Permissions: You must have the appropriate administrative permissions to perform these actions.
  • Namespaces: User accounts affect access across namespaces; roles control namespace-specific permissions.
  • Scripting: Use caution to avoid exposing plain-text passwords and follow your organization's security guidelines.

Installing the JDBC driver

To connect Matillion Data Productivity Cloud to an InterSystems Caché database, you must manually install the appropriate JDBC driver. As of this writing, the Caché JDBC driver is not bundled with Matillion due to licensing and redistribution restrictions. You will need to obtain and upload the driver yourself before you can create database integrations.

1. Download the JDBC Driver

Begin by obtaining the JDBC driver directly from InterSystems:

  • Visit the official InterSystems Caché product page at: https://www.intersystems.com/products/cach%C3%A9/
  • Locate the downloads section and search specifically for the Type 4 JDBC driver. Type 4 drivers are preferred because they are pure Java and do not require native libraries.

Due to licensing terms, you might need to log in or request access through your organization’s InterSystems account.

2. Prepare the Driver File

After you have downloaded the driver jar file, take note of its file location. Always refer to vendor documentation or precise download instructions to ensure you are using a compatible driver version for both your Caché database version and your Java runtime.

3. Upload the Driver to Matillion Data Productivity Cloud

To add the driver to your Matillion environment:

  1. Consult the official Matillion documentation for uploading external JDBC drivers here: Uploading External JDBC Drivers.
  2. Follow the steps to access your Matillion Agent’s interface and upload the Caché JDBC driver
    .jar
    file in accordance with these instructions.
  3. Confirm that the driver is visible and correctly registered within the Matillion Agent after upload.

Note that depending on your Matillion deployment method or cloud environment, the location and interface to upload the driver may differ—ensure you select the correct environment and agent.

4. Configure Your Database Connection in Matillion

With the driver installed, you can now create database connections from within Matillion Data Productivity Cloud using the Caché JDBC driver. For detailed, step-by-step usage instructions, refer to the official guide here: Using the Database Query Component.

Within the Matillion user interface, select the Caché driver as your connection class and supply all required configuration parameters for successful connectivity.

Be sure to reference both the Caché and Matillion documentation if you encounter configuration or compatibility issues during this process.

Checking network connectivity

To ensure successful integration between your Matillion Data Productivity Cloud environment and your InterSystems Caché database, you must verify that the appropriate network connectivity is in place. Depending on your deployment architecture, follow the relevant guidance below:

1. Full SaaS agent configuration
If you are using the Full SaaS agent configuration, incoming connections to your InterSystems Caché database must be permitted from the IP addresses used by Matillion’s Data Productivity Cloud SaaS platform. You can find the up-to-date list of these IP addresses at:
https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/

Ensure these IP addresses are added to your firewall or network security group to allow communication from the platform to your database.

2. Hybrid SaaS deployment
If you are deploying in Hybrid SaaS mode, you need to permit incoming connections to your InterSystems Caché database from the address range of your own Virtual Private Cloud (VPC), where your Matillion Hybrid SaaS agent is running. To help you verify network access and troubleshoot connectivity, visit:
https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/

Make sure that network policies (such as security groups, firewalls, and routing tables) allow traffic from your VPC to reach the InterSystems Caché instance.

DNS Resolution
If your InterSystems Caché database is referenced using a DNS hostname (rather than directly by IP address), you must ensure that the Full SaaS or Hybrid SaaS agent can resolve this DNS address to the correct IP. Confirm that DNS resolution is possible from the network environment where the agent is running to avoid connectivity issues.

Querying Data from InterSystems Caché Database

This guide describes how to query data from an InterSystems Caché database, focusing on SQL SELECT statements, considerations for datatype conversion between Caché and Snowflake, and implementing an initial plus incremental load pattern using the Database Query component.


Example SQL SELECT Statements

You can query InterSystems Caché data using standard SQL SELECT syntax. Here are some typical examples:

``` -- Select all data from a table SELECT * FROM Sample.Person;

-- Select specific columns SELECT ID, Name, DOB FROM Sample.Person;

-- Filtering results SELECT ID, Name FROM Sample.Person WHERE DOB >= '2000-01-01';

-- Using JOINs SELECT p.Name, a.Street FROM Sample.Person AS p JOIN Sample.Address AS a ON p.AddressID = a.ID;

-- Aggregation SELECT COUNT(*) AS NumPersons, MIN(DOB) AS EarliestDOB FROM Sample.Person; ```

Note: Schema and class/table names in InterSystems Caché are case-sensitive unless delimited.


Datatype Conversion to Snowflake

When replicating data from InterSystems Caché to Snowflake, consider that some InterSystems datatypes may be converted or mapped into different Snowflake data types. For example:

Caché SQL Datatype Typical Snowflake Mapping
INTEGER
NUMBER
NUMERIC
NUMBER
VARCHAR(n)
VARCHAR
DATE
DATE
TIMESTAMP
TIMESTAMP_NTZ
BIT
(Boolean)
BOOLEAN
FLOAT
,
DOUBLE
FLOAT
  • Some Caché-specific datatypes (like references or collections) may require flattening or transformation before Snowflake ingestion.

Preferred Data Loading Pattern

The recommended approach for data loading is:

  1. Once-off Initial Load: Extract and replicate all available data from Caché up to a certain point.
  2. Incremental Loads: After the initial load, extract only new or changed data based on a filter (e.g., timestamp or incrementing key).

The Database Query component is used for both types of loads.


Initial Load Usage

During an initial load the Database Query has no filter clause, meaning it captures the full set of data:

SELECT ID, Name, DOB, LastModified
FROM Sample.Person;


Incremental Load Usage

During incremental loads, the Database Query contains a filter to select only the changed or new records:

SELECT ID, Name, DOB, LastModified
FROM Sample.Person
WHERE LastModified > ?

  • The parameter (
    ?
    ) is replaced at runtime with the value of the latest loaded
    LastModified
    timestamp.
  • This approach minimizes the data volume extracted and transferred.

Read more on the Incremental Load Data Replication Strategy in Matillion Exchange.


Tip: Use the same Database Query component for both initial and incremental processes, altering the SQL as required for each phase. Make sure to manage checkpoints (such as the latest timestamp or key) to track progress across loads.

Data Integration Architecture

Loading data in advance of integration represents a "divide and conquer" approach to data integration, as it separates the process into distinct steps: loading and then transforming the data. This is a key advantage of the ELT (Extract, Load, Transform) architecture. By first loading raw data into Snowflake, organizations can then focus on data transformations and integrations as a separate phase, enabling greater flexibility and efficiency. Since data integration inherently requires data transformation, the most effective way to accomplish this in Snowflake is through data transformation pipelines. These pipelines allow for systematic, repeatable transformation logic to be applied as part of the integration process. Furthermore, a significant benefit of the ELT approach is that all data transformation and integration activities occur natively inside the Snowflake database. This ensures that processing is fast, scalable, and available on demand, without the need to provision or manage extra data processing infrastructure, thereby keeping operational costs low.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.