Extracting data from IBM AS/400 to Snowflake
Extracting data from IBM AS/400 is a critical step for organizations seeking to leverage modern cloud-based analytics platforms such as Snowflake. The AS/400, now known as IBM i, continues to underpin essential business operations across numerous enterprises, yet its data is often siloed from contemporary data environments. Successfully integrating AS/400 data into Snowflake facilitates advanced analytics, business intelligence, and data-driven decision-making. This article outlines a practical approach for extracting and loading AS/400 data into Snowflake, focusing on proven methods with Matillion. We will begin by detailing the process of creating an appropriate identity—a login or user profile—on the IBM AS/400 system to facilitate secure data access. For organizations using Matillion as their ETL tool, we will describe how to verify the presence of the necessary JDBC driver for AS/400 connectivity or acquire it if absent. Ensuring reliable and secure network connectivity between your AS/400 source and Snowflake target is another essential consideration, which we will address in the subsequent section. Finally, we will discuss methods for querying data—both when performing the initial data extraction and when implementing incremental loads to keep your Snowflake environment up to date. By following these steps, you will establish a robust, automated pipeline for AS/400 to Snowflake data integration, supporting your organization’s modernization and analytics objectives.
What is IBM AS/400?
The IBM AS/400 database, originally DB2/400 and now Db2 for i, is an integrated RDBMS within the IBM i operating system on IBM Power Systems. Unlike standalone RDBMSs, it is tightly linked with the OS for seamless security, backup, and journaling. Db2 for i supports SQL and DDS, enabling legacy compatibility and modern workloads. It features referential integrity, constraints, triggers, distributed data access, and a unique single-level storage that abstracts physical and logical files. This design ensures high reliability, easy management, and strong performance for both transactional and analytical applications.
What is Snowflake?
Snowflake is a fully managed, cloud-native data warehouse platform engineered to handle large-scale data analytics with high performance and scalability. Built on a proprietary multi-cluster, shared data architecture, Snowflake separates storage and compute resources, enabling customers to independently scale workloads and optimize costs. It supports diverse data types and workloads, including structured and semi-structured data (such as JSON, Avro, and Parquet) without the need for complex ETL pipelines. Snowflake provides robust security features, automatic scaling, and zero management overhead, all accessible via ANSI SQL, and integrates natively with major cloud providers like AWS, Azure, and Google Cloud Platform, making it a versatile choice for modern, cloud-centric data solutions.
Why Move Data from IBM AS/400 into Snowflake
Unlocking Advanced Analytics: The Value of Transferring IBM AS/400 Data to Snowflake
A data engineer or architect might seek to copy data from IBM AS/400 into Snowflake for several compelling reasons. First, the IBM AS/400, also known as IBM iSeries, often holds vast stores of historically and operationally significant data that can be invaluable for business insights and analytics. However, this data’s value is truly realized when it is integrated with information from other organizational sources to enable cross-functional analysis, advanced reporting, and machine learning applications. By bringing AS/400 data into Snowflake—a cloud-based, scalable data platform—organizations can perform comprehensive integration, transformation, and analysis without imposing additional processing demands on the legacy AS/400 system. This approach preserves the performance and reliability of mission-critical operations on the AS/400 while unlocking new opportunities for data-driven decision making in Snowflake’s modern analytics environment.
Similar connectors
Creating a User in IBM AS/400 (IBM i) Database
IBM AS/400 (now known as IBM i) distinguishes between operating system users and database users; the operating system user profile also serves as your database user. You manage users via OS/400 commands or via SQL if using newer features. Below are common methods for creating a user.
Method 1: Using OS/400 Command (CRTUSRPRF
)
This is the traditional and most widely used approach.
- Sign on to your IBM i system with a user profile that has security administration rights (SECOFR authority).
- Run the following command in an interactive CL session or using the IBM Navigator for i interface:
CRTUSRPRF USRPRF(NEWUSER) PASSWORD(MyPwd123) USRCLS(*USER) TEXT('App database user')USRPRF
- The name of the new user profile.PASSWORD
- The initial password for the user.USRCLS
- User class (commonly USER, SECOFR, etc.).TEXT
- Description for the user profile.
Method 2: Using SQL
From IBM i 7.2 onwards, SQL procedures can be used for user management, provided your database is at a sufficient version.
- Connect as QSECOFR or a profile with requisite authorities.
- Create the user using SQL procedural interface:
CALL QSYS.CREATE_USER_PROFILE( USER_PROFILE => 'NEWUSER', PASSWORD => 'MyPwd123', USER_CLASS => 'USER', TEXT_DESCRIPTION => 'App database user' ); - Note: The procedure and capabilities depend on IBM i OS version and underlying security policies.
Granting Database Permissions
Once the user is created, use SQL
GRANTstatements to provide database access. Example:
GRANT SELECT, INSERT, UPDATE, DELETE ON MYSCHEMA.MYTABLE TO USER NEWUSER;
This assigns the required privileges on the specific table to the newly created user.
Note: Replace values (e.g.,
NEWUSER,
MyPwd123, table and schema names) with values appropriate to your environment and naming conventions. Always enforce your organization’s password and security policies.
Installing the JDBC driver
At the time of writing, Matillion Data Productivity Cloud does not include the IBM AS/400 JDBC driver by default. This is due to licensing and redistribution restrictions imposed by IBM. In order to connect to an IBM AS/400 database, you must manually obtain and install the appropriate JDBC driver onto your Matillion Data Productivity Cloud environment.
1. Download the IBM AS/400 JDBC Driver
- Navigate to the official IBM support page to obtain the JDBC driver: https://www.ibm.com/support/pages/node/619313.
- When selecting the driver, ensure you choose the "Type 4 JDBC driver" rather than a Type 2 driver. A Type 4 driver is recommended as it is a pure Java implementation and does not require any native library dependencies.
- You may need to log in to the IBM Support Portal or accept license agreements before downloading. The driver's file is typically packaged as a JAR, such as jt400.jar or with a similar name.
2. Prepare the Driver for Upload
Once the appropriate Type 4 driver has been downloaded, make sure it is accessible from your machine. Do not unpack the JAR file; it must be uploaded to Matillion as a single file.
3. Upload the JDBC Driver into Matillion Data Productivity Cloud
- Visit the Matillion Data Productivity Cloud documentation for detailed instructions on uploading external drivers: https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/.
- Follow the steps provided to upload the driver:
- Access the “External Drivers” configuration panel in your Matillion environment.
- Use the interface to upload the downloaded .jar file.
- Ensure you assign the correct scope (such as which agents or environments require access to this driver).
- After uploading, you may need to restart the agent or any running sessions to make the new driver available for use.
4. Connect to IBM AS/400 from Matillion
Once the driver is available, you can create connections that utilize the IBM AS/400 JDBC type 4 driver. For step-by-step guidelines on configuring and using database queries with your uploaded driver, refer to the product documentation: https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/.
Always ensure that the JDBC driver version is compatible with your version of the IBM AS/400 database and with the Matillion Data Productivity Cloud platform. If in doubt, consult IBM's release notes and the Matillion documentation.
Checking network connectivity
To enable successful connectivity between Matillion Data Productivity Cloud and your IBM AS/400 database, you must ensure that the database server is configured to accept incoming connections according to your product deployment type:
-
Full SaaS Agent Configuration:
Allow incoming connections from the range of public IP addresses used by Matillion Data Productivity Cloud. The complete and current list of IP addresses is available at: https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/ -
Hybrid SaaS Deployment:
Permit incoming connections from the IP addresses used in your organization's Virtual Private Cloud (VPC) environment that hosts the Matillion Data Productivity Cloud agent. You can utilize utilities provided at https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/ to verify network access and connectivity from your VPC to the IBM AS/400 database.
Additionally, if the IBM AS/400 database is accessed using a DNS hostname rather than a static IP address, the Matillion Data Productivity Cloud agent (whether Full SaaS or Hybrid SaaS) must be able to resolve the database's DNS address. Please ensure that any required DNS lookups can be performed successfully from the agent's network location.
Querying Data from an IBM AS/400 Database
This guide provides technical instructions for querying data from an IBM AS/400 (now IBM i) database, including SQL SELECT statement examples, datatype compatibility considerations, and best practices for initial and incremental data loading—particularly when integrating with platforms such as Snowflake using tools like the Matillion ETL Database Query component.
1. Example IBM AS/400 SQL SELECT Queries
An IBM AS/400 database supports ANSI-standard SQL with some system-specific conventions. The default schema is often the user profile, and table names are typically in uppercase.
Basic SELECT Example:
SELECT * FROM SCHEMA_NAME.TABLE_NAME;
Selecting Specific Columns:
SELECT CUSTOMER_ID, ORDER_DATE, AMOUNT FROM SALES.CUSTOMER_ORDERS;
Filter Using WHERE Clause:
SELECT * FROM INVENTORY.PRODUCTS WHERE CATEGORY = 'ELECTRONICS';
Incremental Extraction by Timestamp:
SELECT * FROM SCHEMA_NAME.TABLE_NAME WHERE LAST_MODIFIED_TIMESTAMP > '2024-06-01 00:00:00';
2. Datatype Conversion Between IBM AS/400 and Snowflake
When moving or querying data from AS/400 to Snowflake, be aware of potential datatype conversions:
| AS/400 Data Type | Typical Snowflake Mapping | Consideration |
|---|---|---|
| CHAR(n), VARCHAR(n) | STRING | Can generally map directly |
| DECIMAL, NUMERIC | NUMBER | Precision/scale should match |
| DATE, TIME, TIMESTAMP | DATE, TIME, TIMESTAMP | Format compatibility is important |
| SMALLINT, INTEGER | NUMBER | Check value ranges |
| BLOB, CLOB | VARIANT or BINARY | May require special handling |
Test and validate each critical field during transfer, as AS/400 can have unique encodings (e.g., EBCDIC) or formats.
3. Initial Load vs. Incremental Loads
A best practice for integrating legacy data (such as with Matillion ETL) is to use a two-step data replication pattern:
Initial Load
- Description: Capture all records in the table.
- Database Query Component: No filter (
WHERE
) clause. - SQL Example:
SELECT * FROM SCHEMA_NAME.TABLE_NAME;
Incremental Load
- Description: Extract just the new or changed data since the last load.
- Database Query Component: Uses a filter to select records added/updated since a known point.
- Typical Filter: WHERE clause using a timestamp or incremental key.
- SQL Example:
SELECT * FROM SCHEMA_NAME.TABLE_NAME WHERE LAST_MODIFIED_TIMESTAMP > ?;?
is replaced with the last successful load's timestamp or identifier.
For in-depth guidance, refer to Incremental Load Data Replication Strategy (Matillion Exchange).
- Use the same Database Query component for initial and incremental phases; just adjust the filter logic as needed.
- Store and update the "high water mark" (e.g., last loaded timestamp) for reliable incremental replication.
References
- Official IBM i SQL Reference
- Matillion Exchange: Incremental Load Data Replication Strategy
Data Integration Architecture
Loading data in advance of integration—often referred to as the 'Extract, Load, and Transform' (ELT) approach—allows organizations to break down the data management process into two manageable steps. By separating data ingestion from transformation, ELT enables teams to "divide and conquer," first swiftly loading raw data into Snowflake and then addressing integration and transformation tasks as distinct phases. Once the data is loaded, integrating data from multiple sources requires structured data transformation, which is most effectively accomplished using data transformation pipelines. These pipelines enable logical, repeatable transformation steps that ensure consistent and reliable integration results. Importantly, a key advantage of the ELT architecture in Snowflake is that both integration and transformation occur entirely within the target database. This means that data processing benefits from Snowflake’s inherent scalability and performance, supporting rapid, on-demand data integration without requiring separate ETL servers or external processing resources—thus reducing infrastructure costs and operational overhead.