Extracting data from IBM AS/400 to Amazon Redshift
Extracting data from IBM AS/400 is an important task for organizations seeking to integrate legacy systems with modern cloud data platforms such as Amazon Redshift. The process requires careful preparation to ensure data is accessed securely, efficiently, and in a way that supports both initial data loads and ongoing updates. In this article, we will guide you through the critical steps required for successful data movement from IBM AS/400 to Amazon Redshift. We will begin by outlining how to create an appropriate user identity in IBM AS/400 to facilitate secure data access. For users leveraging Matillion ETL, we will discuss how to check for or acquire the correct JDBC driver required to connect Matillion to the AS/400 system. Next, we will address establishing robust network connectivity between the source and target environments to support uninterrupted transfers. Finally, we will review best practices for querying and extracting data from AS/400—covering both initial, full-table extractions and ongoing incremental updates. By the end of this article, you will have a clear understanding of the preparation and execution steps needed to extract your critical business data from IBM AS/400 and load it seamlessly into Amazon Redshift.
What is IBM AS/400?
The IBM AS/400—now IBM i on Power Systems—features a deeply integrated relational database, originally called DB2/400, built into its operating system. This provides seamless multi-language support, strong data integrity, centralized management, and single-level storage for both objects and SQL tables (physical/logical files), accessible via native APIs or SQL. Over time, IBM added referential integrity, triggers, journaling, and distributed database support. This OS-level integration simplifies administration and security, reduces overhead, and contributes to the AS/400's reputation for reliability and performance.
What is Amazon Redshift?
Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse from AWS, optimized for efficient storage, processing, and querying of large structured or semi-structured datasets. Built on PostgreSQL, Redshift uses columnar storage, advanced compression, and massive parallel processing (MPP) for high query performance on analytic workloads. It integrates seamlessly with AWS tools like S3, Glue, and Athena, offering robust security, workload management, data encryption, and automatic scaling. With a familiar SQL interface, compatibility with standard BI tools, and support for diverse data formats, Redshift is a versatile, high-performance, cloud-native solution for modern data warehousing needs.
Why Move Data from IBM AS/400 into Amazon Redshift
Unlocking Analytics: Benefits of Copying IBM AS/400 Data into Amazon Redshift
A data engineer or architect may wish to copy data from IBM AS/400 into Amazon Redshift for several compelling reasons. IBM AS/400 systems often contain vast repositories of business-critical, historical, or operational data that hold significant potential value for analytics and decision-making. However, the true worth of this data is fully realized only when it is integrated with information from other enterprise sources, enabling richer insights and more comprehensive analyses. By transporting AS/400 data into Redshift, organizations can leverage a modern, scalable cloud data warehouse specifically designed for large-scale analytical workloads. This approach not only simplifies the complex process of integrating diverse datasets but also ensures that computationally intensive queries and processing are offloaded from the AS/400 system, thereby preserving its performance for core transaction processing and day-to-day operations.
Similar connectors
Creating a User in IBM AS/400 (IBM i) Database
On IBM AS/400 (IBM i) systems, database access is typically tied to user profiles at the operating system level, rather than accounts within the database itself. Therefore, you create a user profile that authorizes a person or application to access the system and its database.
Below are step-by-step instructions using command-line utilities and, optionally, SQL scripting for ensuring a new user has the required access.
1. Creating a New User Profile
- Sign on to your AS/400 (IBM i) system with an account that has SECADM authority or sufficient privileges.
- Use the following CL (Control Language) command to create a user profile:
plaintext
CRTUSRPRF USRPRF(NEWUSER)
PASSWORD(MySecurePwd123)
USRCLS(*USER)
STATUS(*ENABLED)
HOMEDIR('/home/NEWUSER')
TEXT('User for Database Access')
You can customize the parameters: -
USRPRF: The user profile name (max 10 characters). -
PASSWORD: The initial password (user will be prompted to change it on first sign-on if system policy dictates). -
USRCLS: User class (e.g., USER, SYSOPR, etc.). -
STATUS: Whether the user is enabled. -
HOMEDIR: Filesystem home directory (optional). -
TEXT: Descriptive text for the profile (optional).
2. Granting Database Access
User profiles are authenticated at the OS level, but you may need to grant explicit authority to database files (tables, libraries, etc.).
For example, grant access to a library or table:
- Granting access to a library:
plaintext
GRTOBJAUT OBJ(MYLIBRARY)
OBJTYPE(*LIB)
USER(NEWUSER)
AUT(*ALL)
- Granting access to a table/physical file:
plaintext
GRTOBJAUT OBJ(MYLIBRARY/MYTABLE)
OBJTYPE(*FILE)
USER(NEWUSER)
AUT(*ALL)
Adjust
AUTas needed:
*USE,
*CHANGE,
*ALL, etc.
3. (Optional) Setting Special Permissions Using SQL Scripts
While traditional user creation does not use SQL (as users are system profiles), you may grant data-specific permissions using SQL for finer granularity.
Example: Granting SELECT privilege via SQL
First, make sure to use an SQL interface such as Run SQL Scripts in IBM Navigator or a similar tool:
GRANT SELECT ON TABLE MYLIBRARY.MYTABLE TO USER NEWUSER;
Note: The USER keyword may differ or require the actual OS user to connect with associated SQL user context.
4. Verifying User Profile
You can view details of the new user profile with:
plaintext DSPOUSRPRF USRPRF(NEWUSER)
This command displays the profile's authorities and other attributes.
References: - IBM Documentation: User Profile (IBM i) - Command help in AS/400: Enter
HELPor
F1in command entry screens.
Installing the JDBC driver
At the time of writing, the JDBC driver required to connect to IBM AS/400 databases is not bundled with Matillion Data Productivity Cloud installations. Due to licensing and redistribution restrictions, you will need to manually download and install the driver.
1. Obtain the IBM AS/400 JDBC Driver
You can download the official IBM JDBC driver for AS/400 (often referred to as IBM i Access or JTOpen) by visiting the IBM support portal at:
https://www.ibm.com/support/pages/node/619313
When downloading, prioritize selecting a Type 4 JDBC driver. Type 4 drivers are pure Java drivers and enable direct connectivity from your Java-based applications or cloud platforms without requiring any native library dependencies.
Carefully review and comply with any IBM license terms presented on their site. Download the appropriate driver package (typically distributed as a JAR file such as
jt400.jar).
2. Upload the Driver to Matillion Data Productivity Cloud
Matillion Data Productivity Cloud provides mechanisms to support external JDBC drivers, as drivers like the AS/400 JDBC package are not included natively. You must upload the downloaded JAR file to your Matillion agent environment.
A full set of steps for uploading custom JDBC drivers can be found in the official documentation at:
https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/
In summary:
- Access the environment in your Matillion Data Productivity Cloud admin interface where agents are managed.
- Locate the capability to add or upload an external JDBC driver (typically within agent configuration or integrations).
- Follow the prompts or procedure to upload the JAR file you previously obtained (
jt400.jar
or similar). - Save or apply the configuration changes as instructed.
3. Utilize the Driver in Your Matillion Workflows
Once the JDBC driver has been uploaded and registered in your Matillion environment, you can proceed to use it in your workflows and database connectivity tasks.
Instructions for configuring connectors and performing database operations using custom JDBC drivers are located at:
https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/
Follow these guidelines to:
- Set up a new database connection referencing the IBM AS/400 JDBC driver.
- Provide connection details such as physical system address, authentication information, and relevant parameters.
- Use this connection in your SQL queries, data extraction, and data transformation tasks within the Matillion Designer.
By following these steps and the referenced documentation, you will successfully enable connectivity between Matillion Data Productivity Cloud and your IBM AS/400 database using the official Type 4 JDBC driver.
Checking network connectivity
To ensure successful integration between Matillion Data Productivity Cloud and your IBM AS/400 database, you must verify that the database allows incoming network connections according to your deployment type:
-
Full SaaS Agent Configuration:
Configure your IBM AS/400 database to accept incoming connections from the set of IP addresses used by Matillion Data Productivity Cloud. The full list of required source IPs is maintained at: https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/ -
Hybrid SaaS Deployment:
In this configuration, the Matillion Data Productivity Cloud agent runs within your own environment. You must allow incoming connections to the AS/400 database from the network address range used by your virtual private cloud (VPC). Utilities to help you check and troubleshoot connectivity are available at: https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/
Regardless of the deployment type, if you are referencing the IBM AS/400 database by its DNS name (rather than by its IP address), ensure that the Matillion agent—whether Full SaaS or Hybrid SaaS—has the necessary network access to resolve the DNS address successfully. Proper DNS resolution is required for connectivity.
Querying Data from an IBM AS/400 Database
IBM AS/400 databases (now known as IBM Db2 for i) support SQL for querying and manipulating data. When integrating AS/400 data with platforms like Amazon Redshift, be aware that datatype conversions may be required. Below are essential instructions covering typical SQL queries and recommended patterns for loading data.
1. Example IBM AS/400 SQL SELECT Statements
Here are basic query examples for AS/400:
``` -- Select all columns from a table SELECT * FROM LIBRARY_NAME.TABLE_NAME;
-- Select specific columns and rows with a filter SELECT ID, CUSTOMER_NAME, ORDER_DATE FROM SALESORDERS WHERE ORDER_DATE >= '2024-06-01';
-- Aggregate functions example SELECT CUSTOMER_ID, COUNT(*) AS ORDER_COUNT FROM SALESORDERS GROUP BY CUSTOMER_ID; ```
Note: Replace LIBRARY_NAME
and TABLE_NAME
with your actual schema (library) and table names, as AS/400 uses libraries to organize tables.
2. Datatype Conversion
When transferring data from AS/400 to Amazon Redshift, some datatype transformations may be necessary due to platform differences:
- CHAR/VARCHAR: Usually map directly, but ensure length limits in Redshift are adequate.
- DECIMAL/NUMERIC: These map, but check that precision/scale is preserved.
- DATE/TIME: Convert to Redshift's date/time types (
DATE
,TIMESTAMP
). - Packed Decimals (ZONED/DECIMAL on AS/400): May require explicit conversion in extraction or during ETL.
Explicit casting in SQL can be useful, for example:
SELECT CAST(ORDER_AMOUNT AS DECIMAL(12,2)) AS ORDER_AMOUNT FROM SALESORDERS;
3. Pattern for Initial and Incremental Loads
The recommended loading pattern is to perform a one-off initial load, followed by incremental loads. This approach ensures efficient data processing and minimizes load times.
- Initial Load: Full table extraction; no filter clause.
- Incremental Loads: Extract only changed or new data; uses a filter clause.
Use the same database query component in both cases, altering only the SQL.
Example: Database Query for Initial Load
-- No filter: retrieves the entire table SELECT * FROM SALESORDERS;
Example: Database Query for Incremental Load
Assume you are incrementally loading using the
ORDER_DATEcolumn:
-- Filter clause: retrieves only new records since the last load SELECT * FROM SALESORDERS WHERE ORDER_DATE > '2024-06-21';
- The filter clause value (e.g.,
'2024-06-21'
) should track the last successful incremental load.
For more details, see the Matillion knowledge base on incremental load data replication strategies.
Tip: Always thoroughly test your queries and datatype mappings to match your downstream system's expectations.
Data Integration Architecture
Loading data in advance of integration is a hallmark of the ELT (Extract, Load, Transform) architecture and a highly effective way to “divide and conquer” the data integration process by explicitly splitting it into two clear steps. First, raw data from source systems is loaded directly into Amazon Redshift, and afterwards, data integration and transformation is performed within Redshift itself. Data integration always requires transforming source data to produce consistent, usable datasets, and the most robust method for accomplishing this is through well-designed data transformation pipelines managed in the database. An additional advantage of the ELT approach is that both transformation and integration tasks leverage Redshift’s native processing power, making operations fast, scalable, and triggered on demand. This in-database process removes the need for extra infrastructure dedicated to data transformation, reducing both complexity and costs typically associated with separate data processing environments.