Integrate data from InterSystems IRIS to Amazon Redshift using Matillion

Our InterSystems IRIS to Redshift connector seamlessly transfers your data to Redshift within minutes, ensuring it remains current—all without the need for manual coding or managing intricate ETL scripts.

InterSystems IRIS
Amazon Redshift
InterSystems IRIS to Amazon Redshift banner

Extracting data from InterSystems IRIS to Amazon Redshift

Extracting data from InterSystems IRIS is a common requirement for organizations seeking to consolidate data for analytics or reporting, particularly when leveraging robust cloud data warehouses like Amazon Redshift. Moving data between these environments can improve access to near-real-time insights and support advanced business intelligence workflows. In this article, we will guide you through the process of extracting data from InterSystems IRIS and loading it into Redshift. We will begin by walking through the steps for creating an identity in InterSystems IRIS to enable secure and managed connectivity. Next, Matillion users will learn how to check for, or acquire, the necessary JDBC driver to facilitate the connection. We will then address network connectivity requirements between the source and target systems, ensuring that your data movement pipeline can operate smoothly. Finally, we will cover techniques for querying your InterSystems IRIS data, both for initial data loads and ongoing incremental extraction. Whether you are integrating enterprise applications or powering business analytics, this article will help you set up a robust data transfer process between InterSystems IRIS and Amazon Redshift.


What is InterSystems IRIS?

InterSystems IRIS is a multidimensional, high-performance data platform designed for developing and deploying data-intensive applications that require seamless interoperability, real-time analytics, and high availability. Built with a hybrid architecture, it supports multiple data models, including relational, object, document, and key-value paradigms within a single, unified engine. InterSystems IRIS enables rapid ingest and concurrent processing of transactional and analytic workloads by combining an ACID-compliant database engine, horizontal and vertical scalability, and in-memory capabilities. Native support for REST, JDBC, ODBC, and messaging standards—alongside embedded analytics and AI integration—makes IRIS suitable for complex, mission-critical environments across healthcare, financial services, and logistics industries.

matillion logo x InterSystems IRIS

What is Amazon Redshift?

Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse service designed to efficiently handle large-scale data analytics workloads. Built on advanced columnar storage and massively parallel processing (MPP) architecture, Redshift enables organizations to execute complex queries across extensive datasets with high performance and low latency. It seamlessly integrates with other AWS services, supports standard SQL, and offers features such as automatic scaling, workload management, result caching, encryption, and fault tolerance. Redshift’s ecosystem includes tools for data ingestion, transformation, and visualization, making it a robust choice for enterprises aiming to implement secure, scalable, and cost-effective analytics solutions in the cloud.

Why Move Data from InterSystems IRIS into Amazon Redshift

Advantages of Transferring Data from InterSystems IRIS to Amazon Redshift for Advanced Analytics

A data engineer or architect might choose to copy data from InterSystems IRIS into Amazon Redshift for several compelling reasons. Firstly, InterSystems IRIS often holds critical and potentially valuable data, such as healthcare records or financial transactions, that can offer meaningful insights when analyzed. However, the true value of this data is often realized only when it is integrated with information from other systems, enabling more comprehensive analytics, reporting, or machine learning. By transferring data into Amazon Redshift, an organization can leverage a scalable, high-performance analytics platform specifically designed for such integration tasks. This approach allows data engineers to perform complex queries and large-scale data transformations without placing additional computational or operational load on the IRIS system, thereby preserving its performance and reliability for transactional use.

Creating a User in InterSystems IRIS

This guide describes how to create a new user identity in an InterSystems IRIS database environment using both the Management Portal and an SQL script.

Using the Management Portal

  1. Log in to the InterSystems IRIS Management Portal with administrative privileges.

  2. Navigate to System Administration > Security > Users.

  3. Click the Create New User button.

  4. Fill out the required fields:

    • User Name: Unique identifier for the user (e.g.,
      myuser
      ).
    • Full Name: Descriptive name for the user.
    • Password: Set the initial password.
    • Optionally configure roles, account status, and other properties according to your requirements.
  5. Click Save to create the new account.

Using SQL Scripts

You can also create a user via the terminal or any SQL client connected to your IRIS instance.

Example SQL Script

CREATE USER myuser IDENTIFIED BY 'MyStrongP@ssw0rd';
GRANT %DB_IRISSAMPLE TO myuser; -- Grant access to a namespace/role (optional)

Notes: - Replace

myuser
and
'MyStrongP@ssw0rd'
with the desired username and password. -
%DB_IRISSAMPLE
is an example role or namespace permission; change as needed for your environment. - You may need adequate privileges (
%Admin_Security
or similar) to execute user creation commands.

Additional SQL Options

  • Disable an account after creation:
    ALTER USER myuser ACCOUNT UNLOCK;
  • Assign multiple roles:
    GRANT %DB_IRISSAMPLE, %DB_USER TO myuser;

For further customization rules on passwords, account expiry, or audit settings, consult the InterSystems IRIS Security Documentation.

Installing the JDBC driver

At the time of writing, the JDBC driver for InterSystems IRIS is not bundled with Matillion Data Productivity Cloud due to licensing constraints and redistribution restrictions. This means that administrators must obtain the driver separately and upload it manually to their Matillion environment before it can be used for database connectivity.

To get started, follow these steps:

  1. Download the JDBC Driver
    Visit the official InterSystems IRIS product page: https://www.intersystems.com/products/intersystems-iris/. On this page, look for the section or link to "Download" or "Product Downloads".
  2. Be sure to locate a Type 4 JDBC driver (also referred to as a “pure Java” JDBC driver), as this is preferred for maximum portability and compatibility with cloud-based platforms like Matillion.
  3. You may be required to register for an InterSystems account or accept license terms before you are able to access the download.

  4. Review Matillion Supported Driver Instructions
    While Matillion does not distribute the IRIS JDBC driver itself, the process to upload any external JDBC driver is straightforward. For detailed instructions and best practices for adding external JDBC drivers, refer to: https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/.

  5. The core steps involve uploading the appropriate JDBC JAR file to your Matillion Agent, following the interface prompts to register the new connection type, and restarting the agent service if required.

  6. Configure and Use the Driver
    Once the driver JAR has been uploaded and registered within Matillion Data Productivity Cloud, you will need to create or edit a database connection utilizing the newly available InterSystems IRIS JDBC driver option.

  7. Comprehensive instructions for configuring and running database queries with newly-uploaded JDBC drivers are available here: https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/.

By following these steps and utilizing the referenced resources, you can successfully enable InterSystems IRIS JDBC support in your Matillion Data Productivity Cloud environment.

Checking network connectivity

To ensure successful connectivity between Matillion Data Productivity Cloud and your InterSystems IRIS database, you must configure your database to allow incoming connections based on your chosen deployment type:

Additionally, if the InterSystems IRIS database is being referenced by a DNS hostname rather than an IP address, the Full SaaS or Hybrid SaaS agent must have the necessary network access and configuration to resolve that DNS address successfully.

Querying Data from InterSystems IRIS Database

This documentation provides guidance for technical users querying data from an InterSystems IRIS database, highlighting usage with SQL statements, datatype considerations, and best practices for data loading (specifically for ETL integration scenarios such as with the Matillion ELT platform).

Example SQL SELECT Statements

Below are some sample SQL queries you might execute against an InterSystems IRIS database:

``` -- Select all records from a table SELECT * FROM Patient;

-- Select specific columns SELECT PatientID, Name, DateOfBirth FROM Patient WHERE DateOfBirth >= '2000-01-01';

-- Using aggregate functions SELECT City, COUNT(*) AS PatientCount FROM Patient GROUP BY City;

-- Filtering with joins SELECT p.Name, a.AppointmentDate FROM Patient p JOIN Appointment a ON p.PatientID = a.PatientID WHERE a.AppointmentDate BETWEEN '2024-01-01' AND '2024-06-01'; ```

Data Type Conversion Considerations (IRIS vs. Redshift)

When extracting/interfacing data between InterSystems IRIS and Amazon Redshift, be aware that datatype conversions may occur. For example:

InterSystems IRIS Data Type Typical Redshift Mapping
INTEGER INTEGER
DECIMAL NUMERIC
VARCHAR VARCHAR
DATE DATE
TIMESTAMP TIMESTAMP
BOOLEAN BOOLEAN
%Library.GlobalCharacterStream VARCHAR (with size limits)

Tip: Pay special attention to large string/blob datatypes (like

%Library.GlobalCharacterStream
or
%Stream.GlobalCharacter
) as Redshift may impose limits on VARCHAR sizes.

Best Patterns for ETL Data Loading

The recommended loading pattern is:

  • Once-off initial load (also called “full extract”)
  • Incremental load (replicating only new or changed data since the last load)

Matillion’s Database Query component is recommended for both load types.

Initial Load (Full Extract)

During an initial load, you extract all data from the source table. No filter clause is used. For example:

SELECT PatientID, Name, DateOfBirth, LastUpdated
FROM Patient;

Configure the Database Query component in Matillion without a

WHERE
clause.

Incremental Load (CDC or Delta Extract)

In incremental loads, you extract only new or updated rows, typically by filtering on a

LastUpdated
timestamp or another monotonically increasing field.

For example, to extract only records updated since a certain point:

SELECT PatientID, Name, DateOfBirth, LastUpdated
FROM Patient
WHERE LastUpdated > '2024-06-19 12:00:00';

  • The filter value (
    '2024-06-19 12:00:00'
    ) should be set dynamically, using the timestamp of the last successful extraction.
  • Re-use the same Database Query component pattern; toggle the filter clause based on loading mode.

For more detail on approaches and incremental load strategies, visit: Incremental Load Data Replication Strategy (Matillion).


Note: Carefully validate data type mappings and test queries in a development environment prior to production use, especially when moving large datasets or sensitive patient/clinical data.

Data Integration Architecture

Loading data into Amazon Redshift in advance of integration exemplifies the "divide and conquer" principle by splitting the data pipeline into two discrete steps: loading (extract and load) and integrating (transform). This approach is fundamental to the ELT (Extract, Load, Transform) architecture, offering notable advantages over traditional ETL workflows. Once data is loaded into Redshift, data integration typically requires various transformations, which are most robustly implemented through data transformation pipelines designed within the database. A further advantage of the ELT model is that both transformation and integration processes occur directly within the target Redshift environment. This not only results in fast and on-demand data processing but also leverages Redshift's native scalability and parallelism, while eliminating the need for, and the costs associated with, external data processing infrastructure.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.