Extracting data from Informix (IBM) to Snowflake
Extracting data from Informix is a common requirement for organizations seeking to modernize their analytics platforms or consolidate data in the cloud. Successfully moving data from Informix to Snowflake involves several technical steps to ensure data integrity, security, and efficiency. In this article, we will guide you through the key stages of this process: creating an identity in Informix, verifying or obtaining the necessary JDBC driver for Matillion users, establishing secure network connectivity between Informix and Snowflake, and designing queries to facilitate both initial and incremental data extraction. Whether you are beginning a migration project or building a recurring pipeline, these steps will provide the foundation for a reliable data transfer from Informix to Snowflake.
What is Informix (IBM)?
Informix is a robust, high-performance relational database management system (RDBMS) developed by IBM, renowned for its flexibility, scalability, and minimal administrative overhead. It supports an extensive array of data types, including traditional SQL, NoSQL (JSON/BSON), time series, and spatial data, making it well-suited for IoT, embedded, and enterprise applications. Informix features advanced replication, high availability solutions such as HDR and MACH11 clustering, and seamless integration with a wide range of programming languages and platforms. Its lightweight footprint and support for autonomous operations allow for deployment across edge devices, cloud environments, and traditional data centers, catering to organizations with demanding data management and analytics requirements.
What is Snowflake?
Snowflake is a fully managed, cloud-native data platform supporting data warehousing, lakes, and engineering. Its multi-cluster shared-data architecture separates compute, storage, and services for efficient, independent scaling. Supporting ANSI SQL, Snowflake offers features like automatic scaling, high concurrency, zero-copy cloning, and time travel for data versioning. It natively integrates with AWS, Azure, and GCP, enabling secure data sharing and collaboration with granular access controls. A pay-as-you-go model and automated maintenance reduce operational overhead, making Snowflake popular with organizations seeking elastic, cloud-agnostic analytics solutions.
Why Move Data from Informix (IBM) into Snowflake
Unlocking Analytics Potential: Transferring Data from Informix to Snowflake
A data engineer or architect may wish to copy data from Informix into Snowflake for several key reasons. Firstly, Informix databases often store business-critical information that is potentially valuable for analytics or decision-making. However, the full value of this data is often realized only when it is integrated with data from other sources, such as customer relationship management systems, marketing platforms, or external datasets. By moving the data into Snowflake, organizations can leverage Snowflake’s robust data integration and analytic capabilities to combine Informix data with other datasets, facilitating richer insights and more comprehensive reporting. Additionally, performing resource-intensive integration and analytics workloads directly on Informix can negatively impact its performance and availability for operational tasks. Offloading these activities to Snowflake not only protects Informix from unnecessary strain but also takes advantage of Snowflake’s scalable architecture for improved query performance and lower maintenance overhead.
Similar connectors
Creating a User in Informix Database
Informix database users are typically managed through the underlying operating system, as Informix relies on OS accounts for authentication by default. However, you need to grant appropriate privileges in the database. Below are instructions for creating a user from both operating system and Informix perspectives.
1. Create the Operating System User
On the Informix server (as root or a user with sudo privileges), create a new OS user. For example, to create a user named
appuser:
sudo useradd appuser sudo passwd appuser
Note: Use the appropriate commands for your operating system.
2. Grant Database Access
Connect to your Informix database as the
informixuser or another user with DBA privileges:
dbaccess sysmaster
3. Grant Database Privileges
Connect to the relevant database and grant the required privileges to the new OS user. For example, to give CONNECT and RESOURCE privileges to the user
appuser:
GRANT CONNECT TO 'appuser'; GRANT RESOURCE TO 'appuser';
You may also grant specific privileges on objects as needed:
GRANT SELECT, INSERT, UPDATE
ON customer
TO 'appuser';
Note: Usernames must match exactly with the operating system account names.
4. Confirm User Access (Optional)
To test the new user's access, log into Informix as the new OS user:
su - appuser dbaccess <database_name>
If privileges are granted properly, the user can connect and perform actions as allowed by the assigned roles and privileges.
References:
- IBM Informix Administrator's Guide
- IBM Informix Guide to SQL: Syntax
Installing the JDBC driver
At the time of writing, the Informix JDBC driver is not bundled with Matillion Data Productivity Cloud. This is due to licensing or redistribution restrictions imposed by IBM on their proprietary drivers. As a result, you will need to manually download and install the Informix JDBC driver before configuring database connections within the Matillion platform.
Step 1: Download the Informix JDBC Driver
Visit the IBM Informix JDBC Driver download page:
https://www.ibm.com/support/pages/informix-jdbc-drivers
When selecting a driver, prioritize obtaining the "Type 4" JDBC driver, which is a pure Java driver recommended for most use cases and environments, including Matillion. Follow the instructions on the IBM page to download the appropriate version of the Informix JDBC driver for your environment. You may be prompted to accept an end-user license agreement before downloading.
Step 2: Upload the Driver to Matillion Data Productivity Cloud
After downloading the driver, you will need to install it within your Matillion Data Productivity Cloud environment. Requirements and instructions for uploading external JDBC drivers can be found here:
https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/
Typically, you will:
- Sign in to your Matillion Data Productivity Cloud instance.
- Navigate to the “Agent” section of the platform.
- Locate the option to upload or manage external drivers.
- Select the Informix JDBC driver file (for example, a file with a name pattern like
ifxjdbc.jar
) and upload it as directed. - Once uploaded, the driver should appear within the list of available adapters or drivers for use by your agent.
Consult the official instructions linked above in case the Matillion platform or driver upload workflows have changed.
Step 3: Connect to Informix Using the JDBC Driver
With the driver installed, you can now proceed to create and configure Informix database connections utilizing Matillion’s designer components. Usage instructions for connecting to a database via JDBC in Matillion Data Productivity Cloud are provided here:
https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/
These instructions cover all the configuration steps needed to leverage your externally provided Informix JDBC driver for database queries and integrations within the Matillion Data Productivity Cloud.
Checking network connectivity
To connect Matillion Data Productivity Cloud to your Informix database, you must ensure that the database instance allows incoming network connections according to your deployment type:
-
Full SaaS agent configuration: You need to whitelist all IP addresses used by the Matillion Data Productivity Cloud service. The current list of IP addresses is available here: Allowing IP Addresses. Make sure your Informix database accepts incoming traffic from these IP addresses.
-
Hybrid SaaS deployment: The Informix instance should allow incoming connections from your own virtual private cloud (VPC). This ensures that the Matillion Hybrid Agent, running within your infrastructure, can access the database. To verify network access from your environment, you can use the available tools at: Check Network Access.
Additionally, if you reference the Informix database using a DNS hostname rather than an IP address, make sure that the Full SaaS or Hybrid SaaS agent has the necessary DNS resolution capability to resolve that address correctly. Without successful DNS resolution, the connection will not be established.
Querying Data from Informix Database
This guide provides instructions for querying data from an Informix database, covering SQL examples, datatype considerations between Informix and Snowflake, and recommended patterns for initial and incremental data loads.
Examples of Informix SQL SELECT Statements
To extract data from an Informix table, use standard SQL
SELECTstatements. Here are some examples:
``` -- Select all columns from a table SELECT * FROM customers;
-- Select specific columns SELECT customer_id, customer_name, created_date FROM customers;
-- Filter results using a WHERE clause SELECT * FROM orders WHERE order_date > '2024-01-01';
-- Join multiple tables SELECT c.customer_name, o.order_total FROM customers c JOIN orders o ON c.customer_id = o.customer_id; ```
Datatype Conversion: Informix <=> Snowflake
Be mindful that when transferring data between Informix and Snowflake, some datatypes may not have direct equivalents.
- Informix to Snowflake examples:
CHAR
,VARCHAR
→VARCHAR
INTEGER
→NUMBER
SMALLFLOAT
→FLOAT
DATETIME YEAR TO SECOND
→TIMESTAMP_NTZ
BYTE
,TEXT
→VARIANT
or staged as files
- Before loading, review Snowflake datatype documentation and plan for any transformation or casting required.
Efficient Data Loading: Initial vs. Incremental
The recommended pattern is: - Once-off initial load: Retrieve the full dataset. - Ongoing incremental loads: Retrieve only data updated/inserted since the last load.
Use the same Database Query component for both.
1. Initial Load Pattern
For the initial load, extract all data — use a
SELECTstatement without filters:
-- Initial full data extraction SELECT * FROM orders;
Matillion's Database Query component would execute a query similar to the above. There should be no WHERE
filter clause.
2. Incremental Load Pattern
For incremental loads, modify the SQL query to include a filter (commonly on a timestamp or auto-increment column):
``` -- Incremental data extraction using a timestamp column SELECT * FROM orders WHERE modified_date > '${last_load_time}';
-- Incremental extraction using a numeric key SELECT * FROM orders WHERE order_id > ${last_loaded_id}; ```
This allows loading only new or changed records since the last successful load.
For comprehensive best practices and further reading on incremental loading, refer to: Incremental Load Data Replication Strategy (Matillion Exchange)
Note: Adjust the column and filter logic to suit your schema and incrementing field(s). Be mindful of nulls, data gaps, and other anomalies when designing incremental filters.
Data Integration Architecture
Loading data into Snowflake prior to integration exemplifies the "divide and conquer" approach, as it separates the data-loading phase from the transformation and integration stage. This method, core to the ELT (Extract, Load, Transform) architecture, enhances efficiency by first bringing raw data into the target system, after which required transformations are performed within the database itself. Effective data integration consistently depends on robust transformation processes, best accomplished through data transformation pipelines, which ensure consistency, scalability, and maintainable transformation logic. Another key benefit of the ELT approach is that both data transformation and integration execute directly within the Snowflake database. This enables fast, on-demand processing that leverages Snowflake's cloud-native scalability, eliminating reliance on separate ETL servers or additional data processing infrastructure—ultimately reducing both operational complexity and cost.