Integrate data from CA IDMS (Broadcom) to Amazon Redshift using Matillion

Our CA IDMS to Redshift connector seamlessly migrates your data to Redshift within minutes, ensuring it remains current without requiring manual coding or complex ETL scripts.

CA IDMS
Amazon Redshift
CA IDMS (Broadcom) to Amazon Redshift banner

Extracting data from CA IDMS (Broadcom) to Amazon Redshift

Extracting data from CA IDMS is a critical task for organizations seeking to modernize their data infrastructure or enable advanced analytics. Moving data from legacy mainframe systems such as CA IDMS to modern cloud data warehouses like Amazon Redshift can unlock the value of historical data and facilitate new insights. However, this process requires careful planning and attention to detail, given the unique characteristics of each system. In this article, we will guide you through the essential steps to extract data from CA IDMS and load it into Amazon Redshift. We will begin by outlining the process of creating an appropriate user identity in CA IDMS to ensure secure and authorized access. For those using Matillion, we will explain how to verify that you have the necessary JDBC driver and how to acquire it if needed. Network connectivity considerations, which are crucial for seamless and secure data transfer between the mainframe and Redshift, will also be addressed. Finally, we will discuss recommended approaches for querying your data—both for the initial full data extraction and for setting up incremental loads to keep Redshift synchronized with changes in CA IDMS. Whether you are new to these platforms or looking to optimize your existing process, this article aims to provide a clear and practical roadmap.


What is CA IDMS (Broadcom)?

CA IDMS (Integrated Database Management System) is a high-performance network model database system originally developed for mainframe environments and now maintained by Broadcom. It provides robust, mission-critical data management capabilities, supporting complex database structures and high transaction volumes typical of large enterprises. IDMS emphasizes record-at-a-time data processing, navigational access paths, and schema definitions tailored for hierarchical and networked data relationships, making it well-suited for applications with intricate data interdependencies and stringent reliability requirements. Native integration with COBOL and other mainframe languages, extensive transaction management, and advanced backup and recovery features ensure uninterrupted operations and data integrity for organizations relying on legacy system architectures.

matillion logo x CA IDMS

What is Amazon Redshift?

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud, designed to facilitate high-performance analysis of large datasets. It leverages columnar storage, parallel query execution, and extensive data compression to optimize query speed and cost efficiency. Redshift supports SQL queries via the PostgreSQL wire protocol, enabling easy integration with existing analytics tools and ETL processes. Its architecture allows for seamless scaling, with features like concurrency scaling and Spectrum for querying data directly in Amazon S3. Security is enforced through encryption, VPC isolation, and integration with AWS IAM, making it a robust solution for enterprises seeking scalable and secure analytical workloads.

Why Move Data from CA IDMS (Broadcom) into Amazon Redshift

Unlocking Analytics: The Benefits of Migrating CA IDMS Data to Amazon Redshift

A data engineer or architect may seek to copy data from CA IDMS into Amazon Redshift for several important reasons. First, CA IDMS often stores extensive historical and operational information that holds significant value for organizations seeking to derive business insights. However, this value is often realized fully only when the data is integrated with other enterprise data sources, enabling comprehensive analysis and deeper visibility into business processes. By transferring this valuable data into Redshift, a cloud-based data warehouse optimized for analytics workloads, organizations can efficiently combine and analyze data from multiple systems. Importantly, conducting such integration and analysis in Redshift prevents additional query and processing load on the CA IDMS platform itself, ensuring that critical transactional operations remain unaffected while still unlocking the data’s analytical potential.

Creating an Identity in CA IDMS

To define a new user in a CA IDMS database, you use the

CREATE USER
Data Definition Language (DDL) command. The new identity allows the user to connect and be assigned security attributes.

Below are step-by-step instructions for creating a user in CA IDMS using SQL.


1. Prepare Your Environment

Ensure you have the necessary administrative privileges (usually the

DBA
role) to create users.

2. Connect to the IDMS Database

Use a supported interface, such as IDMS Service Administrator, OCF, or a mainframe terminal environment equipped with an SQL prompt.

CONNECT TO system_name
USER DBA
USING password;

Replace

system_name
with your DBNAME, and provide valid DBA credentials.

3. Create a New User

Issue the

CREATE USER
statement. You must supply a user name and (optionally) specify attributes such as password, default schema, and allowed privileges.

CREATE USER new_user
    IDENTIFIED BY password
    DEFAULT SCHEMA schema_name;

  • new_user
    : The user's unique IDMS name (up to 32 characters).
  • password
    : Initial password for the user.
  • schema_name
    : (Optional) Sets the user's default schema.

Example

CREATE USER analyst01
    IDENTIFIED BY Xw39sjk!
    DEFAULT SCHEMA payroll_schema;

4. Grant Roles or Privileges

You must grant appropriate privileges to the new user. For example, to grant them the

READ
privilege on a specific schema:

GRANT READ ON SCHEMA payroll_schema TO analyst01;

Or, to assign a role (if your system uses roles):

GRANT payroll_role TO analyst01;

5. Commit the Changes

On some systems, changes are in a transaction and must be committed:

COMMIT;


Note: User creation in CA IDMS may be subject to your site's specific security authorization and operating procedures. Always verify local guidelines before performing security administration tasks.

Installing the JDBC Driver

The CA IDMS JDBC driver enables connections between Matillion Data Productivity Cloud and CA IDMS databases. Please note that the JDBC driver for CA IDMS is not pre-installed with Matillion Data Productivity Cloud due to licensing or redistribution restrictions. Therefore, you will need to manually obtain and install the appropriate driver before you can establish connections to IDMS databases.

Step 1: Obtain the CA IDMS JDBC Driver

Visit the official web resources or contact your CA/Broadcom support representative to acquire the CA IDMS JDBC driver. At the time of this writing, a direct external download link for the driver may not be provided here due to licensing requirements and redistribution restrictions. If you are given multiple options, ensure you select a Type 4 JDBC driver, as this provides pure Java connectivity and optimal compatibility with cloud environments like Matillion Data Productivity Cloud.

Step 2: Download the Driver

Once you have the appropriate permissions and access, download the CA IDMS Type 4 JDBC driver from the location specified by your vendor or administrator. In some cases, the driver may be distributed as a JAR file.

Step 3: Upload the Driver to Matillion Data Productivity Cloud

Matillion Data Productivity Cloud allows the addition of external JDBC drivers to extend its database connectivity. Follow the official instructions provided by Matillion to upload the JDBC driver to your environment:

This guide outlines how to upload and register the driver via the Matillion user interface or agent configuration, making it available for use within database connection components.

Step 4: Configure and Use the Driver

After successfully uploading the driver, you can start setting up database queries and connections to your CA IDMS system.

These instructions provide information on selecting your custom driver, specifying connection parameters, and executing queries within your workflows.

Note: Since the driver is not included by default, ensure your organization has the appropriate third-party licenses, and adhere to any additional prerequisites mandated by your compliance policies or mainframe administrators.

Checking network connectivity

Before connecting Matillion Data Productivity Cloud to your CA IDMS database, you must ensure that the CA IDMS database allows incoming connections from the appropriate network locations, depending on your deployment type:

Additionally, if your CA IDMS database is accessed using a DNS hostname rather than a direct IP address, it is essential that the Matillion Full SaaS or Hybrid SaaS agent is able to resolve this DNS address correctly. Ensure that any required DNS records are configured and accessible from the relevant network location to prevent connectivity issues.

Querying Data from a CA IDMS Database

This guide explains how to query data from a CA IDMS database, including SQL SELECT statement examples, managing datatype conversions (e.g., with Amazon Redshift), and best practices for initial and incremental data loads using a consistent Database Query approach.


1. Querying with SQL SELECT Statements

CA IDMS supports an SQL interface, though traditional usage often involves network-defined access paths. For modern integration and ETL processes, use SQL SELECT statements similar to those in standard relational databases.

Example: Selecting All Columns from a Table

SELECT * FROM EMPLOYEE;

Example: Filtering Results

SELECT EMP_ID, NAME, DEPARTMENT
FROM EMPLOYEE
WHERE DEPARTMENT = 'SALES';

Example: Joining Tables

SELECT E.EMP_ID, E.NAME, D.DEPARTMENT_NAME
FROM EMPLOYEE E
JOIN DEPARTMENT D ON E.DEPT_ID = D.DEPT_ID
WHERE D.LOCATION = 'CHICAGO';

Syntax may vary based on your CA IDMS SQL schema definitions. Use your tool's native SQL dialect if extended features are necessary.


2. Datatype Conversion with Redshift

When extracting data from CA IDMS for loading into Amazon Redshift, expect datatype conversions. For example:

  • CA IDMS Numeric Types (e.g.,
    INTEGER
    ,
    DECIMAL
    ) map to Redshift’s
    INTEGER
    ,
    BIGINT
    , or
    DECIMAL
    types.
  • CA IDMS Character Types (e.g.,
    CHAR(n)
    ,
    VARCHAR(n)
    ) map to Redshift’s
    VARCHAR(n)
    .
  • Dates/Times may need format conversion (
    DATE
    ,
    TIME
    ,
    TIMESTAMP
    ).

You may need to CAST or FORMAT certain columns during extraction or loading to ensure compatibility:

SELECT EMP_ID,
       CAST(HIRE_DATE AS CHAR(10)) AS HIRE_DATE_STRING
FROM EMPLOYEE;


3. Data Load Pattern: Initial and Incremental Loads

The recommended approach—especially when replicating CA IDMS to Redshift—is to use a two-phase load strategy: a one-time initial load, followed by ongoing incremental loads. Use the same Database Query component (in Matillion ETL or similar tools) for both phases.

a. Initial Load

  • Purpose: Load the entire dataset once.
  • Query: No filter clause—extract all records.

SELECT * FROM EMPLOYEE;

The Database Query component should have no

WHERE
clause and extracts all data.

b. Incremental Load

  • Purpose: Extract new or updated records since the last load.
  • Query: Add a suitable filter clause (usually a timestamp or incrementing key).

SELECT *
FROM EMPLOYEE
WHERE LAST_UPDATED > '2024-06-01 00:00:00';

  • The value in the
    WHERE
    clause (
    LAST_UPDATED
    ) should be parameterized using the max timestamp from your previous load.

Reference: See Matillion's incremental load replication strategy for more details.


Note: Both approaches (initial and incremental) use the same Database Query component, only changing the filter criteria. This consistency simplifies pipeline configuration and maintenance.

Data Integration Architecture

Loading data in advance of integration is a core principle of the Extract, Load, and Transform (ELT) architecture, allowing organizations to divide and conquer the data integration challenge by breaking it into two distinct steps: first loading the raw data into Amazon Redshift, and then performing integration and transformation within the database itself. Data integration inherently involves data transformation, and data transformation pipelines are the most effective method for consistently processing and converting raw data into usable formats. A major advantage of the ELT approach is that all transformation and integration activities take place within the target Redshift environment. This architecture allows you to leverage Redshift’s processing power to quickly, efficiently, and scalably execute transformations directly where the data resides, without needing to pay for or maintain separate data processing infrastructure. This results in an on-demand, cost-effective, and flexible solution for integrating diverse datasets.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.