Integrate data from IBM Netezza to Snowflake using Matillion

Our IBM Netezza to Snowflake connector enables seamless and timely data transfer to Snowflake within minutes, eliminating the need for manual coding or managing complex ETL scripts.

IBM Netezza
Snowflake
IBM Netezza to Snowflake banner

Extracting data from IBM Netezza to Snowflake

Extracting data from IBM Netezza is a common requirement for organizations seeking to modernize their data platforms by leveraging cloud-based solutions such as Snowflake. Moving data efficiently and securely entails a series of preparatory and operational steps that ensure both systems can communicate and transfer information reliably. This article provides a practical guide to successfully extract data from IBM Netezza and load it into Snowflake. We will start by outlining how to create an appropriate identity in IBM Netezza, which is essential for secure access to your data. For Matillion users, we will discuss how to verify the presence of the required JDBC driver or obtain it if necessary. The article will then address how to establish and test network connectivity between your Netezza source and Snowflake target environments, a critical step to support smooth data movement. Finally, we will explain methods for querying Netezza data, covering both full (initial) and incremental extraction approaches to accommodate different migration or integration scenarios. By following these steps, you will be well-prepared to implement a robust data pipeline from IBM Netezza to Snowflake.


What is IBM Netezza?

IBM Netezza is a high-performance, massively parallel processing (MPP) data warehouse appliance designed for efficient storage, retrieval, and analysis of large-scale datasets. It leverages a unique architecture that combines integrated hardware, software, and storage to deliver streamlined analytic performance, utilizing Field-Programmable Gate Arrays (FPGAs) for rapid data filtering and transformation as data is loaded. Netezza operates using a simplified management model, exposing a standard SQL interface and supporting compatibility with leading BI and ETL tools while minimizing administrative overhead. Its design enables scalable, ad hoc query execution with consistent, predictable response times, making it a compelling solution for data-driven enterprises facing demanding analytics workloads.

matillion logo x IBM Netezza

What is Snowflake?

Snowflake is a cloud-native data platform for data warehousing, analytics, and data lakes. Its architecture separates storage and compute for flexible, cost-effective scaling. Running on AWS, Azure, and Google Cloud, Snowflake provides fully managed infrastructure with minimal manual administration. Notable features include secure data sharing, support for semi-structured data (VARIANT types), data versioning with time travel, and strong concurrency controls. A SQL-based interface and integration with popular data tools make it accessible for engineers and analysts, ensuring high performance and scalability in multi-tenant environments.

Why Move Data from IBM Netezza into Snowflake

The Benefits of Transferring Data from IBM Netezza to Snowflake for Advanced Analytics

A data engineer or architect may wish to copy data from IBM Netezza into Snowflake for several compelling reasons. IBM Netezza often houses significant amounts of potentially valuable enterprise data, however, its analytical potential is best realized when this data is integrated with information from other sources such as operational databases, cloud storage, or external datasets. Leveraging Snowflake for this integration enables organizations to perform advanced analytics and business intelligence across a much broader dataset, uncovering insights that might remain hidden within the siloed confines of Netezza. Furthermore, by offloading data integration and processing tasks to Snowflake’s cloud-based platform, organizations avoid imposing additional workloads on their IBM Netezza systems, thus preserving performance and stability for critical legacy operations. This approach allows organizations to modernize their data strategy and unlock new value without disrupting existing systems.

Creating an Identity in IBM Netezza Database

The following instructions explain how to create a user (identity) in an IBM Netezza database environment. You can execute SQL commands using 

nzsql
, an administrative tool provided with Netezza.

1. Prerequisites

  • You must have administrative privileges or be granted appropriate authority to create users. Generally, this requires access as an admin user (such as 
    nz
    ).

2. Syntax Overview

To create a user in IBM Netezza, use the 

CREATE USER
 SQL statement:

CREATE USER username WITH PASSWORD 'yourStrongPassword';

username
: The name of the user to be created. - 
yourStrongPassword
: The password for the user account.

3. Example: Creating a User

The example below creates a user named 

data_analyst
 with the password 
S3cur3P@ssw0rd
.

CREATE USER data_analyst WITH PASSWORD 'S3cur3P@ssw0rd';

4. Additional Options

You can assign additional options, such as administrative permissions, or restrict the user:

Option Description
 
CREATEDB
User can create new databases
 
WITHOUT CREATEDB
User cannot create databases
 
SYSADMIN
User has system admin privileges
 
NOSYSADMIN
No sysadmin privileges
 
VALID UNTIL
Expire account on specific date

Example:

CREATE USER data_analyst
  WITH PASSWORD 'S3cur3P@ssw0rd'
  NOSYSADMIN
  DEFAULT_SCHEMA my_schema
  VALID UNTIL '2024-12-31';

5. Assigning the User to a Group

To manage permissions efficiently, add the new user to a group:

CREATE GROUP analysts;
ALTER GROUP analysts ADD USER data_analyst;

6. Using
nzsql
Command-Line Tool

You can also create a user interactively at the command-line:

bash
nzsql -u admin_user -d system

Then run your 

CREATE USER
 command at the 
nzsql
 prompt.


Note: Ensure passwords comply with your organization's password policy and Netezza's password requirements. Always use secure channels and best practices for user management.

Installing the JDBC driver

The IBM Netezza JDBC driver enables connectivity between Matillion Data Productivity Cloud and Netezza databases. Due to licensing or redistribution restrictions, the Netezza JDBC driver is not included by default in Matillion’s data integrations. You must download and install this driver manually before you can establish a connection. Follow the steps below to complete this process.

1. Download the Netezza JDBC Driver

  1. Go to the official IBM download page for the Netezza JDBC driver:
    https://www.ibm.com/support/pages/ibm-netezza-jdbc-driver
  2. On the download page, locate and obtain the Type 4 JDBC driver (this is the most suitable type for use with Matillion).
  3. You will typically need to accept IBM’s license agreement before proceeding with the download.

2. Locate the Driver File

After downloading, you should have a

.jar
file (for example,
nzjdbc3.jar
). Make sure this file is stored securely and is readily accessible for the installation step.

3. Upload the JDBC Driver to Matillion

Since Matillion Data Productivity Cloud does not bundle the Netezza driver, you need to upload it manually. Detailed upload instructions are available here:
https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/

In summary:

  • Login to the Matillion Data Productivity Cloud user interface.
  • Navigate to the Agent environment where you want the driver to be available.
  • Use the options provided to upload the
    .jar
    file as a new external JDBC driver.

Please refer to the link above for the step-by-step process and requirements (for example, permissions, or supported agent versions).

4. Configure and Use the Driver

Once the driver is installed, Matillion will recognize it in the list of available database connectors. Follow these instructions to create your first database query or connection, incorporating the newly added driver:
https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/

The documentation provides guidance on specifying connection parameters and integrating Netezza data sources using the Product Cloud Designer interface.


Following these steps, you will have access to the IBM Netezza database from your Matillion platform. Always ensure to consult official documentation for any recent changes or driver-specific instructions.

Checking network connectivity

To connect Matillion Data Productivity Cloud to your IBM Netezza database, you must ensure that the database accepts incoming network connections according to your deployment model:

Additionally, if your IBM Netezza database is referenced using a DNS name, the Full SaaS or Hybrid SaaS agent must be able to resolve the DNS address correctly. Make sure that any applicable DNS records are accessible from the relevant network.

Querying Data from an IBM Netezza Database

This guide provides instructions and examples on querying data from an IBM Netezza database, with considerations for ETL practices and data integration (e.g., migrating to Snowflake). The examples use standard SQL SELECT statements, demonstrate the handling of incremental versus initial loads, and touch on datatype conversions.


Example IBM Netezza Queries

Below are some typical sample queries you might run against a Netezza database.

``` -- Fetch all columns from a table SELECT * FROM sales.orders;

-- Select specific columns and apply a WHERE filter SELECT order_id, customer_id, order_date FROM sales.orders WHERE order_date >= '2024-01-01';

-- Aggregate query: total sales amount per customer SELECT customer_id, SUM(sale_amount) AS total_sales FROM sales.transactions GROUP BY customer_id;

-- Join between two tables SELECT o.order_id, c.customer_name, o.order_date FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.status = 'SHIPPED'; ```


Datatype Conversion: IBM Netezza vs. Snowflake

When moving data between IBM Netezza and Snowflake, some datatype conversion is often necessary. Common considerations include:

IBM Netezza Datatype Snowflake Datatype
INTEGER NUMBER(38,0)
FLOAT/DOUBLE FLOAT
VARCHAR(n) VARCHAR(n)
CHAR(n) CHAR(n)
BYTEINT NUMBER(3,0)
DATE DATE
TIMESTAMP TIMESTAMP_NTZ/LTZ
BOOLEAN BOOLEAN

Note: Review your schema and test thoroughly to ensure datatype compatibility and preservation of precision.


Data Extraction Best Practice: Initial and Incremental Loads

The recommended load pattern is:

  1. One-time Initial Load: Extracts the entire dataset without filters.
  2. Incremental Loads: Extracts only new or changed records by applying a filter such as a modified timestamp or ID.

Use the same Database Query component/tool in both phases—simply adjust the SQL filter.

Initial Load: No Filter Clause

-- Load all rows (initial load)
SELECT * FROM sales.transactions;

Incremental Load: With Filter Clause

-- Load only rows where 'last_updated' is newer than the last load time (incremental load)
SELECT *
FROM sales.transactions
WHERE last_updated >= '2024-06-01 00:00:00';

Adjust the value

'2024-06-01 00:00:00'
as needed to the timestamp from your last successful extraction.

For a detailed strategy including high-water mark management, see Matillion Incremental Load Strategy.


Tip: Always validate query results when switching between IBM Netezza and Snowflake to ensure accuracy after data type conversions and filter applications.

Data Integration Architecture

Loading data into Snowflake in advance of integration embodies the "divide and conquer" approach by breaking the overall process into two manageable steps. In the context of ELT (Extract, Load, Transform) architecture, this allows organizations to first load raw source data into the target environment, and then perform data integration using separate, optimized transformation steps. Effective data integration invariably requires transformation of the loaded data, and the most efficient way to achieve this within Snowflake is through data transformation pipelines, which can automate and orchestrate the various transformation tasks. A key advantage of the ELT approach is that all data transformations and integrations are performed inside the Snowflake database itself. This means the process is fast, scalable, and available on demand, without the need to invest in or maintain separate data processing infrastructure, ultimately streamlining operations and reducing costs.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.