Extracting data from CA IDMS (Broadcom) to Amazon Redshift
Extracting data from CA IDMS is a critical task for organizations seeking to modernize their data infrastructure or enable advanced analytics. Moving data from legacy mainframe systems such as CA IDMS to modern cloud data warehouses like Amazon Redshift can unlock the value of historical data and facilitate new insights. However, this process requires careful planning and attention to detail, given the unique characteristics of each system. In this article, we will guide you through the essential steps to extract data from CA IDMS and load it into Amazon Redshift. We will begin by outlining the process of creating an appropriate user identity in CA IDMS to ensure secure and authorized access. For those using Matillion, we will explain how to verify that you have the necessary JDBC driver and how to acquire it if needed. Network connectivity considerations, which are crucial for seamless and secure data transfer between the mainframe and Redshift, will also be addressed. Finally, we will discuss recommended approaches for querying your data—both for the initial full data extraction and for setting up incremental loads to keep Redshift synchronized with changes in CA IDMS. Whether you are new to these platforms or looking to optimize your existing process, this article aims to provide a clear and practical roadmap.
What is CA IDMS (Broadcom)?
CA IDMS (Integrated Database Management System) is a high-performance network model database system originally developed for mainframe environments and now maintained by Broadcom. It provides robust, mission-critical data management capabilities, supporting complex database structures and high transaction volumes typical of large enterprises. IDMS emphasizes record-at-a-time data processing, navigational access paths, and schema definitions tailored for hierarchical and networked data relationships, making it well-suited for applications with intricate data interdependencies and stringent reliability requirements. Native integration with COBOL and other mainframe languages, extensive transaction management, and advanced backup and recovery features ensure uninterrupted operations and data integrity for organizations relying on legacy system architectures.
What is Amazon Redshift?
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud, designed to facilitate high-performance analysis of large datasets. It leverages columnar storage, parallel query execution, and extensive data compression to optimize query speed and cost efficiency. Redshift supports SQL queries via the PostgreSQL wire protocol, enabling easy integration with existing analytics tools and ETL processes. Its architecture allows for seamless scaling, with features like concurrency scaling and Spectrum for querying data directly in Amazon S3. Security is enforced through encryption, VPC isolation, and integration with AWS IAM, making it a robust solution for enterprises seeking scalable and secure analytical workloads.
Why Move Data from CA IDMS (Broadcom) into Amazon Redshift
Unlocking Analytics: The Benefits of Migrating CA IDMS Data to Amazon Redshift
A data engineer or architect may seek to copy data from CA IDMS into Amazon Redshift for several important reasons. First, CA IDMS often stores extensive historical and operational information that holds significant value for organizations seeking to derive business insights. However, this value is often realized fully only when the data is integrated with other enterprise data sources, enabling comprehensive analysis and deeper visibility into business processes. By transferring this valuable data into Redshift, a cloud-based data warehouse optimized for analytics workloads, organizations can efficiently combine and analyze data from multiple systems. Importantly, conducting such integration and analysis in Redshift prevents additional query and processing load on the CA IDMS platform itself, ensuring that critical transactional operations remain unaffected while still unlocking the data’s analytical potential.
Similar connectors
Creating an Identity in CA IDMS
To define a new user in a CA IDMS database, you use the
CREATE USERData Definition Language (DDL) command. The new identity allows the user to connect and be assigned security attributes.
Below are step-by-step instructions for creating a user in CA IDMS using SQL.
1. Prepare Your Environment
Ensure you have the necessary administrative privileges (usually the
DBArole) to create users.
2. Connect to the IDMS Database
Use a supported interface, such as IDMS Service Administrator, OCF, or a mainframe terminal environment equipped with an SQL prompt.
CONNECT TO system_name USER DBA USING password;
Replace
system_namewith your DBNAME, and provide valid DBA credentials.
3. Create a New User
Issue the
CREATE USERstatement. You must supply a user name and (optionally) specify attributes such as password, default schema, and allowed privileges.
CREATE USER new_user
IDENTIFIED BY password
DEFAULT SCHEMA schema_name;
new_user
: The user's unique IDMS name (up to 32 characters).password
: Initial password for the user.schema_name
: (Optional) Sets the user's default schema.
Example
CREATE USER analyst01
IDENTIFIED BY Xw39sjk!
DEFAULT SCHEMA payroll_schema;
4. Grant Roles or Privileges
You must grant appropriate privileges to the new user. For example, to grant them the
READprivilege on a specific schema:
GRANT READ ON SCHEMA payroll_schema TO analyst01;
Or, to assign a role (if your system uses roles):
GRANT payroll_role TO analyst01;
5. Commit the Changes
On some systems, changes are in a transaction and must be committed:
COMMIT;
Note: User creation in CA IDMS may be subject to your site's specific security authorization and operating procedures. Always verify local guidelines before performing security administration tasks.
Installing the JDBC Driver
The CA IDMS JDBC driver enables connections between Matillion Data Productivity Cloud and CA IDMS databases. Please note that the JDBC driver for CA IDMS is not pre-installed with Matillion Data Productivity Cloud due to licensing or redistribution restrictions. Therefore, you will need to manually obtain and install the appropriate driver before you can establish connections to IDMS databases.
Step 1: Obtain the CA IDMS JDBC Driver
Visit the official web resources or contact your CA/Broadcom support representative to acquire the CA IDMS JDBC driver. At the time of this writing, a direct external download link for the driver may not be provided here due to licensing requirements and redistribution restrictions. If you are given multiple options, ensure you select a Type 4 JDBC driver, as this provides pure Java connectivity and optimal compatibility with cloud environments like Matillion Data Productivity Cloud.
Step 2: Download the Driver
Once you have the appropriate permissions and access, download the CA IDMS Type 4 JDBC driver from the location specified by your vendor or administrator. In some cases, the driver may be distributed as a JAR file.
Step 3: Upload the Driver to Matillion Data Productivity Cloud
Matillion Data Productivity Cloud allows the addition of external JDBC drivers to extend its database connectivity. Follow the official instructions provided by Matillion to upload the JDBC driver to your environment:
- Refer to the installation guide here: Uploading External Drivers
This guide outlines how to upload and register the driver via the Matillion user interface or agent configuration, making it available for use within database connection components.
Step 4: Configure and Use the Driver
After successfully uploading the driver, you can start setting up database queries and connections to your CA IDMS system.
- For detailed steps on configuring a database query component to use your newly-installed driver, consult: Database Query Usage Instructions
These instructions provide information on selecting your custom driver, specifying connection parameters, and executing queries within your workflows.
Note: Since the driver is not included by default, ensure your organization has the appropriate third-party licenses, and adhere to any additional prerequisites mandated by your compliance policies or mainframe administrators.
Checking network connectivity
Before connecting Matillion Data Productivity Cloud to your CA IDMS database, you must ensure that the CA IDMS database allows incoming connections from the appropriate network locations, depending on your deployment type:
-
Full SaaS Agent Configuration:
Your CA IDMS database must accept incoming connections from the specific public IP addresses used by Matillion's SaaS agents. The list of required IP addresses is maintained at: https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/. Please update your firewall or network security settings to allow these IPs. -
Hybrid SaaS Deployment:
For Hybrid SaaS setups, the CA IDMS database must accept connections from within your own Virtual Private Cloud (VPC). This means allowing inbound access from the private network addresses allocated to your environment. To facilitate testing and validation, you can use the network utilities provided at: https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/.
Additionally, if your CA IDMS database is accessed using a DNS hostname rather than a direct IP address, it is essential that the Matillion Full SaaS or Hybrid SaaS agent is able to resolve this DNS address correctly. Ensure that any required DNS records are configured and accessible from the relevant network location to prevent connectivity issues.
Querying Data from a CA IDMS Database
This guide explains how to query data from a CA IDMS database, including SQL SELECT statement examples, managing datatype conversions (e.g., with Amazon Redshift), and best practices for initial and incremental data loads using a consistent Database Query approach.
1. Querying with SQL SELECT Statements
CA IDMS supports an SQL interface, though traditional usage often involves network-defined access paths. For modern integration and ETL processes, use SQL SELECT statements similar to those in standard relational databases.
Example: Selecting All Columns from a Table
SELECT * FROM EMPLOYEE;
Example: Filtering Results
SELECT EMP_ID, NAME, DEPARTMENT FROM EMPLOYEE WHERE DEPARTMENT = 'SALES';
Example: Joining Tables
SELECT E.EMP_ID, E.NAME, D.DEPARTMENT_NAME FROM EMPLOYEE E JOIN DEPARTMENT D ON E.DEPT_ID = D.DEPT_ID WHERE D.LOCATION = 'CHICAGO';
Syntax may vary based on your CA IDMS SQL schema definitions. Use your tool's native SQL dialect if extended features are necessary.
2. Datatype Conversion with Redshift
When extracting data from CA IDMS for loading into Amazon Redshift, expect datatype conversions. For example:
- CA IDMS Numeric Types (e.g.,
INTEGER
,DECIMAL
) map to Redshift’sINTEGER
,BIGINT
, orDECIMAL
types. - CA IDMS Character Types (e.g.,
CHAR(n)
,VARCHAR(n)
) map to Redshift’sVARCHAR(n)
. - Dates/Times may need format conversion (
DATE
,TIME
,TIMESTAMP
).
You may need to CAST or FORMAT certain columns during extraction or loading to ensure compatibility:
SELECT EMP_ID,
CAST(HIRE_DATE AS CHAR(10)) AS HIRE_DATE_STRING
FROM EMPLOYEE;
3. Data Load Pattern: Initial and Incremental Loads
The recommended approach—especially when replicating CA IDMS to Redshift—is to use a two-phase load strategy: a one-time initial load, followed by ongoing incremental loads. Use the same Database Query component (in Matillion ETL or similar tools) for both phases.
a. Initial Load
- Purpose: Load the entire dataset once.
- Query: No filter clause—extract all records.
SELECT * FROM EMPLOYEE;
The Database Query component should have no
WHEREclause and extracts all data.
b. Incremental Load
- Purpose: Extract new or updated records since the last load.
- Query: Add a suitable filter clause (usually a timestamp or incrementing key).
SELECT * FROM EMPLOYEE WHERE LAST_UPDATED > '2024-06-01 00:00:00';
- The value in the
WHERE
clause (LAST_UPDATED
) should be parameterized using the max timestamp from your previous load.
Reference: See Matillion's incremental load replication strategy for more details.
Note: Both approaches (initial and incremental) use the same Database Query component, only changing the filter criteria. This consistency simplifies pipeline configuration and maintenance.
Data Integration Architecture
Loading data in advance of integration is a core principle of the Extract, Load, and Transform (ELT) architecture, allowing organizations to divide and conquer the data integration challenge by breaking it into two distinct steps: first loading the raw data into Amazon Redshift, and then performing integration and transformation within the database itself. Data integration inherently involves data transformation, and data transformation pipelines are the most effective method for consistently processing and converting raw data into usable formats. A major advantage of the ELT approach is that all transformation and integration activities take place within the target Redshift environment. This architecture allows you to leverage Redshift’s processing power to quickly, efficiently, and scalably execute transformations directly where the data resides, without needing to pay for or maintain separate data processing infrastructure. This results in an on-demand, cost-effective, and flexible solution for integrating diverse datasets.