Integrate data from InterSystems IRIS to Snowflake using Matillion

The InterSystems IRIS to Snowflake connector enables seamless, automated data transfer to Snowflake in just minutes, ensuring your information remains current without requiring manual coding or complex ETL processes.

InterSystems IRIS
Snowflake
InterSystems IRIS to Snowflake banner

Extracting data from InterSystems IRIS to Snowflake

Extracting data from InterSystems IRIS is an important step for organizations looking to integrate their healthcare or enterprise data with modern cloud-based data platforms such as Snowflake. Achieving a reliable data pipeline involves several key stages, from establishing the necessary permissions in InterSystems IRIS to ensuring connectivity and efficient data transfer. In this article, we will guide you through the essential steps required for a successful data extraction and loading process. We will begin with creating the appropriate identity in InterSystems IRIS, which is necessary for secure access. Next, for users of Matillion, we will cover how to check for or acquire the correct JDBC driver to facilitate connectivity between systems. We will also discuss the critical aspect of establishing network connectivity from the source (InterSystems IRIS) to the target (Snowflake). Finally, we will outline methods for querying data, addressing both initial full loads and ongoing incremental updates. By following this guide, you will gain the knowledge needed to set up a robust and secure data flow from InterSystems IRIS to Snowflake, making your enterprise data more accessible and actionable.


What is InterSystems IRIS?

InterSystems IRIS is a high-performance, multi-model data platform integrating relational, object, document, and key-value models in one engine. It supports ACID transactions, SQL and NoSQL access, and scales both horizontally and vertically. IRIS features embedded interoperability—integrated messaging, API management—for seamless data integration, along with built-in AI and analytics tools for advanced processing within the database. Its robust engine efficiently handles complex, hybrid workloads, making IRIS ideal for real-time, mission-critical, and data-intensive applications. This unified platform simplifies development and deployment, positioning InterSystems IRIS as a versatile solution for modern data-management needs.

matillion logo x InterSystems IRIS

What is Snowflake?

Snowflake is a cloud-native data platform offering fully managed, multi-cluster architecture for high-performance data warehousing, analytics, and secure sharing. Designed for public clouds (AWS, Azure, Google Cloud), it leverages elastic scalability and separates storage from compute, enabling fast, concurrent queries. Its architecture features micro-partitioned, columnar storage and supports time travel, zero-copy cloning, and secure data sharing. Snowflake handles both structured and semi-structured data (e.g., JSON, Avro, Parquet) and is ANSI SQL-compliant, making it accessible for teams familiar with standard SQL.

Why Move Data from InterSystems IRIS into Snowflake

Advantages of Replicating InterSystems IRIS Data to Snowflake for Advanced Analytics

A data engineer or architect may choose to copy data from InterSystems IRIS into Snowflake for several compelling reasons. Firstly, InterSystems IRIS often serves as a system of record, housing data that is both mission-critical and potentially valuable for broader analytical purposes. By integrating this data with information from other sources, organizations can unlock deeper insights and realize the full potential of their data assets. Utilizing Snowflake as the platform for data integration and analysis offers significant advantages, especially in terms of scalability and performance. By offloading analytical workloads to Snowflake, organizations can avoid imposing additional processing demands on the InterSystems IRIS environment, thereby preserving its performance and stability for transactional operations. This strategy ensures efficient data integration while maintaining optimal operation of both systems.

Creating a User in InterSystems IRIS

To create a new user in InterSystems IRIS, you can use either the Management Portal GUI or run SQL scripts directly. Below are concise instructions for both approaches, with a focus on command-line execution via SQL.

Prerequisites

  • Sufficient privileges: You must be logged in as a user with the
    %Admin_Security
    role or equivalent.
  • A connection to the IRIS instance using the SQL interface.

Creating a User via SQL

InterSystems IRIS supports user management through SQL using the

%SYSTEM_SQLSecurity
schema. To create a user, execute the following SQL command:

CALL %SYSTEM_SQLSecurity.CreateUser(
    'username',         -- User name
    'password',         -- Initial password
    1,                  -- Enabled (1 = enabled, 0 = disabled)
    'Full Name',        -- User's full name (optional)
    '[email protected]' -- User's email (optional)
)

Example: Creating a User

The following snippet creates a new enabled user

appuser
with the password
StrongPassword!
and optional details:

CALL %SYSTEM_SQLSecurity.CreateUser(
    'appuser',
    'StrongPassword!',
    1,
    'Application User',
    '[email protected]'
)

Assigning Roles to the User

After creating the user, you can assign existing roles to grant appropriate permissions. For example, to assign the

%DB_MYAPP
role:

CALL %SYSTEM_SQLSecurity.AddUserToRole(
    'appuser',
    '%DB_MYAPP'
)

Repeat for other roles as needed.

Alternative: Creating a User via the Management Portal

  1. Log in to the Management Portal (
    http://:52773/csp/sys/UtilHome.csp
    ).
  2. Navigate to: System Administration > Security > Users
  3. Click Create New User.
  4. Fill out the required information (Username, Password, etc.).
  5. Assign roles and save the new user.

For further details, see InterSystems IRIS Security Administration Guide.

Installing the JDBC Driver

The InterSystems IRIS JDBC driver is required to connect Matillion Data Productivity Cloud to your InterSystems IRIS database. However, please note that at the time of writing, the JDBC driver is not included by default in Matillion Data Productivity Cloud due to licensing or redistribution restrictions. As a result, you will need to download and install the driver yourself by following the steps below.

1. Download the InterSystems IRIS JDBC Driver

  1. Visit https://www.intersystems.com/products/intersystems-iris/ to access the InterSystems IRIS product page.
  2. Locate the download section for client connectivity or developer resources.
  3. Look specifically for the Type 4 JDBC driver, which is a pure Java implementation and offers the greatest compatibility and simplicity for use with Matillion Data Productivity Cloud. Ensure you download the correct JAR file for the Type 4 JDBC driver.

Note: The download process may require you to register or accept license terms on the InterSystems website.

2. Upload the Driver to Matillion Data Productivity Cloud

Once you have obtained the JDBC driver JAR file, refer to Matillion's documented procedure for installing external drivers into an agent within your environment:

This documentation covers:

  • How to access the agent configuration interface.
  • The process for uploading new driver files (such as the Type 4 JDBC JAR) to the correct location in the agent.
  • Important considerations regarding driver versioning and potential restarts required.

3. Using the JDBC Driver in Matillion Data Productivity Cloud

After the driver installation is complete, you can configure and test your InterSystems IRIS connections within Matillion Data Productivity Cloud. For comprehensive instructions on how to use and reference your newly installed JDBC driver, consult the following usage guide:

This resource includes guidance for:

  • Setting up a database connection in the Matillion Designer.
  • Selecting the correct driver during connection configuration.
  • Providing the necessary connection details for InterSystems IRIS.

By following these steps and referencing the provided documentation, you can enable seamless connectivity between Matillion Data Productivity Cloud and your InterSystems IRIS database using the official JDBC driver.

Checking network connectivity

To ensure successful connectivity between Matillion Data Productivity Cloud and your InterSystems IRIS database, you must configure the database to accept incoming connections, with requirements depending on your deployment type:

  • Full SaaS Agent Configuration:
    For Full SaaS agent deployments, the InterSystems IRIS database must allow incoming connections from the IP addresses used by Matillion’s SaaS infrastructure. These IP addresses are listed here: Matillion Allowed IP Addresses. Ensure these addresses are whitelisted in the necessary network access controls, firewalls, or security groups protecting your database.

  • Hybrid SaaS Deployment:
    For Hybrid SaaS deployments, you must allow incoming connections from your own Virtual Private Cloud (VPC) where the Matillion agent operates. Make sure that the relevant IP range(s) associated with your VPC are permitted to access the InterSystems IRIS database. Matillion provides network access check utilities to help validate connectivity: Check Network Access.

Additionally, if your InterSystems IRIS database is referenced using a DNS name rather than a direct IP address, ensure that the Matillion agent host (whether Cloud or Hybrid) can resolve the database’s DNS address. Proper DNS resolution is required for the agent to initiate and maintain a connection.

Querying Data from InterSystems IRIS

This guide explains how to query data from an InterSystems IRIS database using SQL, describes data type considerations (especially when moving data to Snowflake), and outlines best practices for performing initial and incremental data loads using the JDBC Load component.

Example SQL SELECT Statements

InterSystems IRIS supports standard ANSI SQL syntax, allowing you to use familiar queries, such as:

``` -- Basic SELECT statement SELECT PatientID, Name, DateOfBirth FROM MyApp.Patients;

-- Applying a WHERE clause SELECT PatientID, Name FROM MyApp.Patients WHERE DateOfBirth > '2000-01-01';

-- Using aggregation SELECT Gender, COUNT(*) as PatientCount FROM MyApp.Patients GROUP BY Gender; ```

You can execute these statements directly via IRIS SQL clients, through programming APIs, or within ETL tools like Matillion by configuring the JDBC Load component.

Data Type Conversion: IRIS and Snowflake

When transferring data from IRIS to a platform like Snowflake, data type conversion is often required. Here are some example mappings:

InterSystems IRIS Snowflake
INTEGER NUMBER(38,0)
NUMERIC / DECIMAL NUMBER(p,s)
VARCHAR(n)/ NVARCHAR(n) VARCHAR(n)
DATE DATE
TIME TIME
TIMESTAMP / DATETIME TIMESTAMP_NTZ
BIT BOOLEAN

Automated ETL tools may handle most conversions, but always review and test for data precision, scale, and semantic changes across systems.

Pattern: Initial Load and Incremental Loads

A robust replication approach consists of a once-off initial load followed by incremental loads:

  • Initial Load
  • Extracts all data from the source IRIS table.
  • The JDBC Load component executes a SELECT statement without a filter clause, for example:

SELECT * FROM MyApp.Patients;

  • Incremental Load
  • Extracts only new or changed records since the last load.
  • The JDBC Load component includes a filter (e.g., using a timestamp column):

SELECT * FROM MyApp.Patients
    WHERE LastUpdated > '2024-06-01 00:00:00';

  • The filter value (bookmark) is advanced each time you run the incremental load.

Use the same JDBC Load component in both cases, switching the filter dynamically according to the context.

For more info, see the Incremental Load & Data Replication Strategy article (Matillion Exchange).

Data Integration Architecture

Loading data into Snowflake in advance of integration exemplifies the "divide and conquer" strategy, an inherent strength of the Extract, Load, and Transform (ELT) architecture. By splitting the process into two distinct steps—loading raw data first, then transforming and integrating it later—organizations gain flexibility and improved control over their data workflows. Data integration, which relies on data transformations to harmonize, cleanse, and merge datasets, is most effectively carried out through dedicated data transformation pipelines. These pipelines provide reproducibility, scalability, and transparency to transformation processes. Another notable advantage of the ELT approach is that transformations and integration are performed natively inside the Snowflake database. This in-database processing is not only fast and on-demand, benefiting from Snowflake's scalability and parallelism, but it also eliminates the need for separate data processing infrastructure—helping reduce costs and simplify operations.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.