Integrate data from Informix (IBM) to Snowflake using Matillion

Our Informix to Snowflake connector streamlines data transfer to Snowflake within minutes, ensuring your information remains current without the need for manual coding or complex ETL processes.

Informix
Snowflake
Informix (IBM) to Snowflake banner

Extracting data from Informix (IBM) to Snowflake

Extracting data from Informix is a common requirement for organizations seeking to modernize their analytics platforms or consolidate data in the cloud. Successfully moving data from Informix to Snowflake involves several technical steps to ensure data integrity, security, and efficiency. In this article, we will guide you through the key stages of this process: creating an identity in Informix, verifying or obtaining the necessary JDBC driver for Matillion users, establishing secure network connectivity between Informix and Snowflake, and designing queries to facilitate both initial and incremental data extraction. Whether you are beginning a migration project or building a recurring pipeline, these steps will provide the foundation for a reliable data transfer from Informix to Snowflake.


What is Informix (IBM)?

Informix is a robust, high-performance relational database management system (RDBMS) developed by IBM, renowned for its flexibility, scalability, and minimal administrative overhead. It supports an extensive array of data types, including traditional SQL, NoSQL (JSON/BSON), time series, and spatial data, making it well-suited for IoT, embedded, and enterprise applications. Informix features advanced replication, high availability solutions such as HDR and MACH11 clustering, and seamless integration with a wide range of programming languages and platforms. Its lightweight footprint and support for autonomous operations allow for deployment across edge devices, cloud environments, and traditional data centers, catering to organizations with demanding data management and analytics requirements.

matillion logo x Informix

What is Snowflake?

Snowflake is a fully managed, cloud-native data platform supporting data warehousing, lakes, and engineering. Its multi-cluster shared-data architecture separates compute, storage, and services for efficient, independent scaling. Supporting ANSI SQL, Snowflake offers features like automatic scaling, high concurrency, zero-copy cloning, and time travel for data versioning. It natively integrates with AWS, Azure, and GCP, enabling secure data sharing and collaboration with granular access controls. A pay-as-you-go model and automated maintenance reduce operational overhead, making Snowflake popular with organizations seeking elastic, cloud-agnostic analytics solutions.

Why Move Data from Informix (IBM) into Snowflake

Unlocking Analytics Potential: Transferring Data from Informix to Snowflake

A data engineer or architect may wish to copy data from Informix into Snowflake for several key reasons. Firstly, Informix databases often store business-critical information that is potentially valuable for analytics or decision-making. However, the full value of this data is often realized only when it is integrated with data from other sources, such as customer relationship management systems, marketing platforms, or external datasets. By moving the data into Snowflake, organizations can leverage Snowflake’s robust data integration and analytic capabilities to combine Informix data with other datasets, facilitating richer insights and more comprehensive reporting. Additionally, performing resource-intensive integration and analytics workloads directly on Informix can negatively impact its performance and availability for operational tasks. Offloading these activities to Snowflake not only protects Informix from unnecessary strain but also takes advantage of Snowflake’s scalable architecture for improved query performance and lower maintenance overhead.

Creating a User in Informix Database

Informix database users are typically managed through the underlying operating system, as Informix relies on OS accounts for authentication by default. However, you need to grant appropriate privileges in the database. Below are instructions for creating a user from both operating system and Informix perspectives.

1. Create the Operating System User

On the Informix server (as root or a user with sudo privileges), create a new OS user. For example, to create a user named

appuser
:

sudo useradd appuser
sudo passwd appuser

Note: Use the appropriate commands for your operating system.

2. Grant Database Access

Connect to your Informix database as the

informix
user or another user with DBA privileges:

dbaccess sysmaster

3. Grant Database Privileges

Connect to the relevant database and grant the required privileges to the new OS user. For example, to give CONNECT and RESOURCE privileges to the user

appuser
:

GRANT CONNECT TO 'appuser';
GRANT RESOURCE TO 'appuser';

You may also grant specific privileges on objects as needed:

GRANT SELECT, INSERT, UPDATE 
    ON customer 
    TO 'appuser';

Note: Usernames must match exactly with the operating system account names.

4. Confirm User Access (Optional)

To test the new user's access, log into Informix as the new OS user:

su - appuser
dbaccess <database_name>

If privileges are granted properly, the user can connect and perform actions as allowed by the assigned roles and privileges.


References:

  • IBM Informix Administrator's Guide
  • IBM Informix Guide to SQL: Syntax

Installing the JDBC driver

At the time of writing, the Informix JDBC driver is not bundled with Matillion Data Productivity Cloud. This is due to licensing or redistribution restrictions imposed by IBM on their proprietary drivers. As a result, you will need to manually download and install the Informix JDBC driver before configuring database connections within the Matillion platform.

Step 1: Download the Informix JDBC Driver

Visit the IBM Informix JDBC Driver download page:
https://www.ibm.com/support/pages/informix-jdbc-drivers

When selecting a driver, prioritize obtaining the "Type 4" JDBC driver, which is a pure Java driver recommended for most use cases and environments, including Matillion. Follow the instructions on the IBM page to download the appropriate version of the Informix JDBC driver for your environment. You may be prompted to accept an end-user license agreement before downloading.

Step 2: Upload the Driver to Matillion Data Productivity Cloud

After downloading the driver, you will need to install it within your Matillion Data Productivity Cloud environment. Requirements and instructions for uploading external JDBC drivers can be found here:
https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/

Typically, you will:

  1. Sign in to your Matillion Data Productivity Cloud instance.
  2. Navigate to the “Agent” section of the platform.
  3. Locate the option to upload or manage external drivers.
  4. Select the Informix JDBC driver file (for example, a file with a name pattern like
    ifxjdbc.jar
    ) and upload it as directed.
  5. Once uploaded, the driver should appear within the list of available adapters or drivers for use by your agent.

Consult the official instructions linked above in case the Matillion platform or driver upload workflows have changed.

Step 3: Connect to Informix Using the JDBC Driver

With the driver installed, you can now proceed to create and configure Informix database connections utilizing Matillion’s designer components. Usage instructions for connecting to a database via JDBC in Matillion Data Productivity Cloud are provided here:
https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/

These instructions cover all the configuration steps needed to leverage your externally provided Informix JDBC driver for database queries and integrations within the Matillion Data Productivity Cloud.

Checking network connectivity

To connect Matillion Data Productivity Cloud to your Informix database, you must ensure that the database instance allows incoming network connections according to your deployment type:

  • Full SaaS agent configuration: You need to whitelist all IP addresses used by the Matillion Data Productivity Cloud service. The current list of IP addresses is available here: Allowing IP Addresses. Make sure your Informix database accepts incoming traffic from these IP addresses.

  • Hybrid SaaS deployment: The Informix instance should allow incoming connections from your own virtual private cloud (VPC). This ensures that the Matillion Hybrid Agent, running within your infrastructure, can access the database. To verify network access from your environment, you can use the available tools at: Check Network Access.

Additionally, if you reference the Informix database using a DNS hostname rather than an IP address, make sure that the Full SaaS or Hybrid SaaS agent has the necessary DNS resolution capability to resolve that address correctly. Without successful DNS resolution, the connection will not be established.

Querying Data from Informix Database

This guide provides instructions for querying data from an Informix database, covering SQL examples, datatype considerations between Informix and Snowflake, and recommended patterns for initial and incremental data loads.


Examples of Informix SQL SELECT Statements

To extract data from an Informix table, use standard SQL

SELECT
statements. Here are some examples:

``` -- Select all columns from a table SELECT * FROM customers;

-- Select specific columns SELECT customer_id, customer_name, created_date FROM customers;

-- Filter results using a WHERE clause SELECT * FROM orders WHERE order_date > '2024-01-01';

-- Join multiple tables SELECT c.customer_name, o.order_total FROM customers c JOIN orders o ON c.customer_id = o.customer_id; ```


Datatype Conversion: Informix <=> Snowflake

Be mindful that when transferring data between Informix and Snowflake, some datatypes may not have direct equivalents.

  • Informix to Snowflake examples:
    • CHAR
      ,
      VARCHAR
      VARCHAR
    • INTEGER
      NUMBER
    • SMALLFLOAT
      FLOAT
    • DATETIME YEAR TO SECOND
      TIMESTAMP_NTZ
    • BYTE
      ,
      TEXT
      VARIANT
      or staged as files
  • Before loading, review Snowflake datatype documentation and plan for any transformation or casting required.

Efficient Data Loading: Initial vs. Incremental

The recommended pattern is: - Once-off initial load: Retrieve the full dataset. - Ongoing incremental loads: Retrieve only data updated/inserted since the last load.

Use the same Database Query component for both.

1. Initial Load Pattern

For the initial load, extract all data — use a

SELECT
statement without filters:

-- Initial full data extraction
SELECT * FROM orders;

Matillion's Database Query component would execute a query similar to the above. There should be no

WHERE
filter clause.

2. Incremental Load Pattern

For incremental loads, modify the SQL query to include a filter (commonly on a timestamp or auto-increment column):

``` -- Incremental data extraction using a timestamp column SELECT * FROM orders WHERE modified_date > '${last_load_time}';

-- Incremental extraction using a numeric key SELECT * FROM orders WHERE order_id > ${last_loaded_id}; ```

This allows loading only new or changed records since the last successful load.

For comprehensive best practices and further reading on incremental loading, refer to: Incremental Load Data Replication Strategy (Matillion Exchange)


Note: Adjust the column and filter logic to suit your schema and incrementing field(s). Be mindful of nulls, data gaps, and other anomalies when designing incremental filters.

Data Integration Architecture

Loading data into Snowflake prior to integration exemplifies the "divide and conquer" approach, as it separates the data-loading phase from the transformation and integration stage. This method, core to the ELT (Extract, Load, Transform) architecture, enhances efficiency by first bringing raw data into the target system, after which required transformations are performed within the database itself. Effective data integration consistently depends on robust transformation processes, best accomplished through data transformation pipelines, which ensure consistency, scalability, and maintainable transformation logic. Another key benefit of the ELT approach is that both data transformation and integration execute directly within the Snowflake database. This enables fast, on-demand processing that leverages Snowflake's cloud-native scalability, eliminating reliance on separate ETL servers or additional data processing infrastructure—ultimately reducing both operational complexity and cost.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.