Extracting data from InterSystems IRIS to Snowflake
Extracting data from InterSystems IRIS is an important step for organizations looking to integrate their healthcare or enterprise data with modern cloud-based data platforms such as Snowflake. Achieving a reliable data pipeline involves several key stages, from establishing the necessary permissions in InterSystems IRIS to ensuring connectivity and efficient data transfer. In this article, we will guide you through the essential steps required for a successful data extraction and loading process. We will begin with creating the appropriate identity in InterSystems IRIS, which is necessary for secure access. Next, for users of Matillion, we will cover how to check for or acquire the correct JDBC driver to facilitate connectivity between systems. We will also discuss the critical aspect of establishing network connectivity from the source (InterSystems IRIS) to the target (Snowflake). Finally, we will outline methods for querying data, addressing both initial full loads and ongoing incremental updates. By following this guide, you will gain the knowledge needed to set up a robust and secure data flow from InterSystems IRIS to Snowflake, making your enterprise data more accessible and actionable.
What is InterSystems IRIS?
InterSystems IRIS is a high-performance, multi-model data platform integrating relational, object, document, and key-value models in one engine. It supports ACID transactions, SQL and NoSQL access, and scales both horizontally and vertically. IRIS features embedded interoperability—integrated messaging, API management—for seamless data integration, along with built-in AI and analytics tools for advanced processing within the database. Its robust engine efficiently handles complex, hybrid workloads, making IRIS ideal for real-time, mission-critical, and data-intensive applications. This unified platform simplifies development and deployment, positioning InterSystems IRIS as a versatile solution for modern data-management needs.
What is Snowflake?
Snowflake is a cloud-native data platform offering fully managed, multi-cluster architecture for high-performance data warehousing, analytics, and secure sharing. Designed for public clouds (AWS, Azure, Google Cloud), it leverages elastic scalability and separates storage from compute, enabling fast, concurrent queries. Its architecture features micro-partitioned, columnar storage and supports time travel, zero-copy cloning, and secure data sharing. Snowflake handles both structured and semi-structured data (e.g., JSON, Avro, Parquet) and is ANSI SQL-compliant, making it accessible for teams familiar with standard SQL.
Why Move Data from InterSystems IRIS into Snowflake
Advantages of Replicating InterSystems IRIS Data to Snowflake for Advanced Analytics
A data engineer or architect may choose to copy data from InterSystems IRIS into Snowflake for several compelling reasons. Firstly, InterSystems IRIS often serves as a system of record, housing data that is both mission-critical and potentially valuable for broader analytical purposes. By integrating this data with information from other sources, organizations can unlock deeper insights and realize the full potential of their data assets. Utilizing Snowflake as the platform for data integration and analysis offers significant advantages, especially in terms of scalability and performance. By offloading analytical workloads to Snowflake, organizations can avoid imposing additional processing demands on the InterSystems IRIS environment, thereby preserving its performance and stability for transactional operations. This strategy ensures efficient data integration while maintaining optimal operation of both systems.
Similar connectors
Creating a User in InterSystems IRIS
To create a new user in InterSystems IRIS, you can use either the Management Portal GUI or run SQL scripts directly. Below are concise instructions for both approaches, with a focus on command-line execution via SQL.
Prerequisites
- Sufficient privileges: You must be logged in as a user with the
%Admin_Security
role or equivalent. - A connection to the IRIS instance using the SQL interface.
Creating a User via SQL
InterSystems IRIS supports user management through SQL using the
%SYSTEM_SQLSecurityschema. To create a user, execute the following SQL command:
CALL %SYSTEM_SQLSecurity.CreateUser(
'username', -- User name
'password', -- Initial password
1, -- Enabled (1 = enabled, 0 = disabled)
'Full Name', -- User's full name (optional)
'[email protected]' -- User's email (optional)
)
Example: Creating a User
The following snippet creates a new enabled user
appuserwith the password
StrongPassword!and optional details:
CALL %SYSTEM_SQLSecurity.CreateUser(
'appuser',
'StrongPassword!',
1,
'Application User',
'[email protected]'
)
Assigning Roles to the User
After creating the user, you can assign existing roles to grant appropriate permissions. For example, to assign the
%DB_MYAPProle:
CALL %SYSTEM_SQLSecurity.AddUserToRole(
'appuser',
'%DB_MYAPP'
)
Repeat for other roles as needed.
Alternative: Creating a User via the Management Portal
- Log in to the Management Portal (
http://
).:52773/csp/sys/UtilHome.csp - Navigate to: System Administration > Security > Users
- Click Create New User.
- Fill out the required information (Username, Password, etc.).
- Assign roles and save the new user.
For further details, see InterSystems IRIS Security Administration Guide.
Installing the JDBC Driver
The InterSystems IRIS JDBC driver is required to connect Matillion Data Productivity Cloud to your InterSystems IRIS database. However, please note that at the time of writing, the JDBC driver is not included by default in Matillion Data Productivity Cloud due to licensing or redistribution restrictions. As a result, you will need to download and install the driver yourself by following the steps below.
1. Download the InterSystems IRIS JDBC Driver
- Visit https://www.intersystems.com/products/intersystems-iris/ to access the InterSystems IRIS product page.
- Locate the download section for client connectivity or developer resources.
- Look specifically for the Type 4 JDBC driver, which is a pure Java implementation and offers the greatest compatibility and simplicity for use with Matillion Data Productivity Cloud. Ensure you download the correct JAR file for the Type 4 JDBC driver.
Note: The download process may require you to register or accept license terms on the InterSystems website.
2. Upload the Driver to Matillion Data Productivity Cloud
Once you have obtained the JDBC driver JAR file, refer to Matillion's documented procedure for installing external drivers into an agent within your environment:
- Follow the detailed instructions at: https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/
This documentation covers:
- How to access the agent configuration interface.
- The process for uploading new driver files (such as the Type 4 JDBC JAR) to the correct location in the agent.
- Important considerations regarding driver versioning and potential restarts required.
3. Using the JDBC Driver in Matillion Data Productivity Cloud
After the driver installation is complete, you can configure and test your InterSystems IRIS connections within Matillion Data Productivity Cloud. For comprehensive instructions on how to use and reference your newly installed JDBC driver, consult the following usage guide:
This resource includes guidance for:
- Setting up a database connection in the Matillion Designer.
- Selecting the correct driver during connection configuration.
- Providing the necessary connection details for InterSystems IRIS.
By following these steps and referencing the provided documentation, you can enable seamless connectivity between Matillion Data Productivity Cloud and your InterSystems IRIS database using the official JDBC driver.
Checking network connectivity
To ensure successful connectivity between Matillion Data Productivity Cloud and your InterSystems IRIS database, you must configure the database to accept incoming connections, with requirements depending on your deployment type:
-
Full SaaS Agent Configuration:
For Full SaaS agent deployments, the InterSystems IRIS database must allow incoming connections from the IP addresses used by Matillion’s SaaS infrastructure. These IP addresses are listed here: Matillion Allowed IP Addresses. Ensure these addresses are whitelisted in the necessary network access controls, firewalls, or security groups protecting your database. -
Hybrid SaaS Deployment:
For Hybrid SaaS deployments, you must allow incoming connections from your own Virtual Private Cloud (VPC) where the Matillion agent operates. Make sure that the relevant IP range(s) associated with your VPC are permitted to access the InterSystems IRIS database. Matillion provides network access check utilities to help validate connectivity: Check Network Access.
Additionally, if your InterSystems IRIS database is referenced using a DNS name rather than a direct IP address, ensure that the Matillion agent host (whether Cloud or Hybrid) can resolve the database’s DNS address. Proper DNS resolution is required for the agent to initiate and maintain a connection.
Querying Data from InterSystems IRIS
This guide explains how to query data from an InterSystems IRIS database using SQL, describes data type considerations (especially when moving data to Snowflake), and outlines best practices for performing initial and incremental data loads using the JDBC Load component.
Example SQL SELECT Statements
InterSystems IRIS supports standard ANSI SQL syntax, allowing you to use familiar queries, such as:
``` -- Basic SELECT statement SELECT PatientID, Name, DateOfBirth FROM MyApp.Patients;
-- Applying a WHERE clause SELECT PatientID, Name FROM MyApp.Patients WHERE DateOfBirth > '2000-01-01';
-- Using aggregation SELECT Gender, COUNT(*) as PatientCount FROM MyApp.Patients GROUP BY Gender; ```
You can execute these statements directly via IRIS SQL clients, through programming APIs, or within ETL tools like Matillion by configuring the JDBC Load component.
Data Type Conversion: IRIS and Snowflake
When transferring data from IRIS to a platform like Snowflake, data type conversion is often required. Here are some example mappings:
| InterSystems IRIS | Snowflake |
|---|---|
| INTEGER | NUMBER(38,0) |
| NUMERIC / DECIMAL | NUMBER(p,s) |
| VARCHAR(n)/ NVARCHAR(n) | VARCHAR(n) |
| DATE | DATE |
| TIME | TIME |
| TIMESTAMP / DATETIME | TIMESTAMP_NTZ |
| BIT | BOOLEAN |
Automated ETL tools may handle most conversions, but always review and test for data precision, scale, and semantic changes across systems.
Pattern: Initial Load and Incremental Loads
A robust replication approach consists of a once-off initial load followed by incremental loads:
- Initial Load
- Extracts all data from the source IRIS table.
- The JDBC Load component executes a SELECT statement without a filter clause, for example:
SELECT * FROM MyApp.Patients;
- Incremental Load
- Extracts only new or changed records since the last load.
- The JDBC Load component includes a filter (e.g., using a timestamp column):
SELECT * FROM MyApp.Patients
WHERE LastUpdated > '2024-06-01 00:00:00';
- The filter value (bookmark) is advanced each time you run the incremental load.
Use the same JDBC Load component in both cases, switching the filter dynamically according to the context.
For more info, see the Incremental Load & Data Replication Strategy article (Matillion Exchange).
Data Integration Architecture
Loading data into Snowflake in advance of integration exemplifies the "divide and conquer" strategy, an inherent strength of the Extract, Load, and Transform (ELT) architecture. By splitting the process into two distinct steps—loading raw data first, then transforming and integrating it later—organizations gain flexibility and improved control over their data workflows. Data integration, which relies on data transformations to harmonize, cleanse, and merge datasets, is most effectively carried out through dedicated data transformation pipelines. These pipelines provide reproducibility, scalability, and transparency to transformation processes. Another notable advantage of the ELT approach is that transformations and integration are performed natively inside the Snowflake database. This in-database processing is not only fast and on-demand, benefiting from Snowflake's scalability and parallelism, but it also eliminates the need for separate data processing infrastructure—helping reduce costs and simplify operations.