Extracting data from Datacom (Broadcom) to Amazon Redshift
Extracting data from Datacom is a critical step for organizations seeking to leverage their mainframe data within modern analytics platforms such as Amazon Redshift. This process enables businesses to derive actionable insights by integrating legacy data sources with scalable cloud data warehouses. In this article, we will guide you through each stage of extracting data from Datacom and loading it into Amazon Redshift. We will begin by describing how to create an identity in Datacom, a prerequisite for secure data access. For users leveraging Matillion ETL, we will outline how to verify whether the required JDBC driver is available, and if not, the process of acquiring and configuring it. Next, we will discuss important considerations around establishing robust and secure network connectivity between Datacom and Redshift. Finally, the article will cover best practices for both initial data extraction and the implementation of incremental data loads to ensure ongoing data freshness. Whether you are new to Datacom integration or looking to streamline existing processes, this guide aims to provide clear and actionable steps for a successful data migration workflow.
What is Datacom (Broadcom)?
The Datacom Database, now maintained by Broadcom Inc., is a high-performance database system originally developed by CSC for mainframes. It supports both relational and network data models, providing strong data integrity, multi-user concurrency, and transactional consistency. Used widely in finance and government, Datacom offers SQL capabilities via Datacom SQL and native navigational access, plus robust tools for administration, monitoring, backup, and recovery. Its flexible architecture enables seamless integration with modern and legacy applications, making it a strategic choice for organizations needing reliable, scalable databases that balance mission-critical performance with evolving IT requirements.
What is Amazon Redshift?
Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse service optimized for online analytic processing (OLAP). Built on PostgreSQL, Redshift employs columnar storage, massively parallel processing (MPP), and advanced data compression to enable high-performance querying and rapid data loading from diverse sources, including Amazon S3, relational databases, and flat files. It seamlessly integrates with the broader AWS ecosystem, supporting features such as Redshift Spectrum for querying data across both Redshift and external S3 datasets, as well as robust security controls, automated backups, and elastic scaling. Redshift’s architecture is designed to support complex analytical workloads with minimal administrative overhead while ensuring data durability, scalability, and cost-effectiveness for large-scale enterprise deployments.
Why Move Data from Datacom (Broadcom) into Amazon Redshift
The Benefits of Copying Data from Datacom to Redshift for Advanced Analytics
A data engineer or architect might choose to copy data from Datacom into Amazon Redshift for several compelling reasons. Firstly, Datacom often contains large volumes of operational data that can yield significant business value when analyzed or integrated with information from other systems. By consolidating Datacom data with data from other sources in Redshift, organizations can uncover deeper insights, enable comprehensive analytics, and support better decision making. Importantly, transferring data to Redshift also helps preserve the performance and stability of the legacy Datacom system, as computationally intensive analytical workloads are offloaded to a purpose-built, scalable data warehouse rather than running directly on the transactional source. This approach maximizes data utility without jeopardizing the reliability or responsiveness of the operational environment.
Similar connectors
Creating a User in Datacom Database
To create a user in a Datacom database, you typically interact with the SQL Services if they are installed. Datacom's security model can also be managed through resource definitions, but for environments using Datacom SQL, user management is handled by GRANT statements once the identity is known to the external security system (e.g., RACF, Top Secret, or CA ACF2 on z/OS).
Below are the general steps for creating and authorizing a user in a Datacom SQL environment.
1. Prerequisites
- Ensure you have appropriate administrative privileges to manage users and permissions.
- Confirm external security (ACF2, RACF, etc.) if required, has the user established.
- Datacom SQL Services should be installed and enabled.
2. Authorizing an External Identity (Example: RACF user)
Datacom itself does not manage authentication natively, but you map the external identity (user) to database permissions.
``` -- Example: Granting privileges to RACF user 'DBUSER01'
GRANT CONNECT TO DBUSER01; GRANT USAGE ON DATABASE MyDatabase TO DBUSER01; GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE MyDatabase.MyTable TO DBUSER01; ```
GRANT CONNECT
enables the user to connect to Datacom SQL using their external credentials.GRANT USAGE
allows the user to access a specified database.- Table-level permissions (SELECT, INSERT, etc.) must be granted as required.
3. Creating an Internal SQL User (Optional Security Model)
If you are running an environment where internal authentication is configured:
``` -- Example: Creating a SQL user and granting permissions
CREATE USER MYUSER01 IDENTIFIED BY 'StrongPassword1';
GRANT CONNECT TO MYUSER01; GRANT USAGE ON DATABASE MyDatabase TO MYUSER01; ```
Note: Internal authentication is rarely used for production on mainframe sites, where centralized security (RACF, ACF2, Top Secret) is preferred.
4. Granting Roles (Optional)
Optionally, you can create roles and grant them to users for easier management:
``` CREATE ROLE APP_READWRITE;
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE MyDatabase.MyTable TO APP_READWRITE;
GRANT APP_READWRITE TO DBUSER01; ```
5. Sample Script (Full Example)
``` -- Granting a new user privileges to access a table in Datacom
GRANT CONNECT TO DBUSER01; GRANT USAGE ON DATABASE CUSTOMERDB TO DBUSER01; GRANT SELECT, INSERT, UPDATE ON TABLE CUSTOMERDB.CUSTOMER TO DBUSER01; ```
References
For complete details, refer to the official IBM Datacom SQL Reference documentation, and consult with your security administrator to align Datacom user management with your site's policies.
Installing the JDBC Driver
At the time of writing, the JDBC driver for Datacom is not distributed with Matillion Data Productivity Cloud due to licensing or redistribution restrictions. Therefore, you must manually obtain and install the driver before establishing any database connections.
Download the Datacom JDBC Driver: - Obtain the driver directly from the authorized distributor or vendor. The current download link is: -
When downloading, be sure to select the Type 4 JDBC driver version where possible. Type 4 drivers are standalone, Java-based drivers that do not require native libraries, providing the highest degree of compatibility and ease of deployment in Java applications and cloud environments like Matillion.
Upload and Install the Driver:
- Once downloaded, you must add the JDBC driver to your Matillion Data Productivity Cloud agent.
- Detailed step-by-step instructions for uploading external JDBC drivers to the agent are found at:
https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/
Configuring JDBC Connections:
- After installing the driver, refer to the official usage documentation for configuring and managing your JDBC database connections within Matillion:
https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/
With the driver installed following these instructions, you’ll be able to use Datacom as a source or target within Matillion Data Productivity Cloud designer workflows.
Checking network connectivity
Before connecting Matillion Data Productivity Cloud to your Datacom database, you must ensure that the database allows incoming connections from the correct sources, depending on your deployment type:
-
Full SaaS Agent Configuration:
The Datacom database must be configured to allow incoming connections from the IP addresses used by Matillion Data Productivity Cloud. Refer to the list of current IP addresses here: Allowing IP addresses for connections. -
Hybrid SaaS Deployment:
In this configuration, your Datacom database must allow incoming connections from your own virtual private cloud (VPC). To validate or troubleshoot connectivity from your environment, helpful utilities are available on the Matillion Exchange: Check Network Access.
Additionally, if your Datacom database is referenced by a DNS hostname (instead of a direct IP address), ensure that the Full SaaS or Hybrid SaaS agent can resolve the DNS name. Proper DNS resolution is required for successful connectivity.
Querying Data from a Datacom Database
This guide provides technical instructions for querying data from a Datacom database, including query examples, datatype conversion considerations, and recommended patterns for loading data (initial and incremental).
1. Example: SQL SELECT Statements in Datacom
Datacom supports SQL access using familiar
SELECTsyntax. Here are example queries:
Basic Query
SELECT CUSTOMER_ID, CUSTOMER_NAME, EMAIL FROM CUSTOMER;
Filtered Query
SELECT ORDER_ID, ORDER_DATE, AMOUNT FROM ORDERS WHERE ORDER_DATE >= '2023-01-01';
Joining Tables
SELECT C.CUSTOMER_NAME, O.ORDER_ID, O.AMOUNT FROM CUSTOMER C JOIN ORDERS O ON C.CUSTOMER_ID = O.CUSTOMER_ID;
2. Datatype Conversion: Datacom to Redshift
If loading data from Datacom into Amazon Redshift, beware of datatype differences. Common conversions include:
| Datacom Datatype | Compatible Redshift Datatype |
|---|---|
| CHAR(n) | VARCHAR(n) |
| NUMERIC(p,s) | DECIMAL(p,s) |
| DATE | DATE |
| TIME | TIME |
| SMALLINT | SMALLINT |
| FLOAT | DOUBLE PRECISION |
When designing ETL processes, confirm compatible mapping to prevent load errors or data truncation.
3. Loading Patterns: Initial and Incremental Loads
3.1 Initial Load
- Purpose: Perform a one-time, complete copy of the desired data.
- Pattern: Use the Database Query component without a
WHERE
clause to select all records. - Example:
SELECT * FROM PURCHASE_TRANSACTIONS;
3.2 Incremental Load
- Purpose: Regularly import new or changed records since the last successful load.
- Pattern: Use the Database Query component with a filter clause. Filter typically on a timestamp or an incrementing ID.
- Example:
SELECT * FROM PURCHASE_TRANSACTIONS
WHERE LAST_UPDATED > '${last_successful_load_time}';
${last_successful_load_time}is a placeholder for your process variable tracking load state.
Read more about Incremental Load: Data Replication Strategy.
- Tip: Utilize the same Database Query component for both initial and incremental loads, modifying only the WHERE/filter clause according to the load type.
By following these guidelines, you can reliably query, extract, and load data from Datacom, handling both bulk and incremental scenarios efficiently.
Data Integration Architecture
Loading data into Redshift in advance of integration offers a practical "divide and conquer" approach by breaking the process into two distinct steps: extraction and loading first, followed by transformation and integration. This methodology is a principal advantage of the ELT (Extract, Load, Transform) architecture, as it separates the often time-consuming data movement from the computationally intensive transformation logic. Effective data integration relies on transforming raw data into a consistent, usable format—best achieved through well-designed data transformation pipelines. Another key benefit of the ELT approach is that these transformations and the ultimate data integration activities are performed directly within the Redshift environment. As a result, integration tasks are fast, available on-demand, and can scale with the needs of the business—all while avoiding the need to provision and maintain separate data processing infrastructure, leading to both operational and cost efficiencies.