Extracting data from IBM Netezza to Snowflake
Extracting data from IBM Netezza is a common requirement for organizations seeking to modernize their data platforms by leveraging cloud-based solutions such as Snowflake. Moving data efficiently and securely entails a series of preparatory and operational steps that ensure both systems can communicate and transfer information reliably. This article provides a practical guide to successfully extract data from IBM Netezza and load it into Snowflake. We will start by outlining how to create an appropriate identity in IBM Netezza, which is essential for secure access to your data. For Matillion users, we will discuss how to verify the presence of the required JDBC driver or obtain it if necessary. The article will then address how to establish and test network connectivity between your Netezza source and Snowflake target environments, a critical step to support smooth data movement. Finally, we will explain methods for querying Netezza data, covering both full (initial) and incremental extraction approaches to accommodate different migration or integration scenarios. By following these steps, you will be well-prepared to implement a robust data pipeline from IBM Netezza to Snowflake.
What is IBM Netezza?
IBM Netezza is a high-performance, massively parallel processing (MPP) data warehouse appliance designed for efficient storage, retrieval, and analysis of large-scale datasets. It leverages a unique architecture that combines integrated hardware, software, and storage to deliver streamlined analytic performance, utilizing Field-Programmable Gate Arrays (FPGAs) for rapid data filtering and transformation as data is loaded. Netezza operates using a simplified management model, exposing a standard SQL interface and supporting compatibility with leading BI and ETL tools while minimizing administrative overhead. Its design enables scalable, ad hoc query execution with consistent, predictable response times, making it a compelling solution for data-driven enterprises facing demanding analytics workloads.
What is Snowflake?
Snowflake is a cloud-native data platform for data warehousing, analytics, and data lakes. Its architecture separates storage and compute for flexible, cost-effective scaling. Running on AWS, Azure, and Google Cloud, Snowflake provides fully managed infrastructure with minimal manual administration. Notable features include secure data sharing, support for semi-structured data (VARIANT types), data versioning with time travel, and strong concurrency controls. A SQL-based interface and integration with popular data tools make it accessible for engineers and analysts, ensuring high performance and scalability in multi-tenant environments.
Why Move Data from IBM Netezza into Snowflake
The Benefits of Transferring Data from IBM Netezza to Snowflake for Advanced Analytics
A data engineer or architect may wish to copy data from IBM Netezza into Snowflake for several compelling reasons. IBM Netezza often houses significant amounts of potentially valuable enterprise data, however, its analytical potential is best realized when this data is integrated with information from other sources such as operational databases, cloud storage, or external datasets. Leveraging Snowflake for this integration enables organizations to perform advanced analytics and business intelligence across a much broader dataset, uncovering insights that might remain hidden within the siloed confines of Netezza. Furthermore, by offloading data integration and processing tasks to Snowflake’s cloud-based platform, organizations avoid imposing additional workloads on their IBM Netezza systems, thus preserving performance and stability for critical legacy operations. This approach allows organizations to modernize their data strategy and unlock new value without disrupting existing systems.
Similar connectors
Creating an Identity in IBM Netezza Database
The following instructions explain how to create a user (identity) in an IBM Netezza database environment. You can execute SQL commands using
nzsql, an administrative tool provided with Netezza.
1. Prerequisites
- You must have administrative privileges or be granted appropriate authority to create users. Generally, this requires access as an admin user (such as
nz
).
2. Syntax Overview
To create a user in IBM Netezza, use the
CREATE USERSQL statement:
CREATE USER username WITH PASSWORD 'yourStrongPassword';
-
username: The name of the user to be created. -
yourStrongPassword: The password for the user account.
3. Example: Creating a User
The example below creates a user named
data_analystwith the password
S3cur3P@ssw0rd.
CREATE USER data_analyst WITH PASSWORD 'S3cur3P@ssw0rd';
4. Additional Options
You can assign additional options, such as administrative permissions, or restrict the user:
| Option | Description |
|---|---|
CREATEDB |
User can create new databases |
WITHOUT CREATEDB |
User cannot create databases |
SYSADMIN |
User has system admin privileges |
NOSYSADMIN |
No sysadmin privileges |
VALID UNTIL |
Expire account on specific date |
Example:
CREATE USER data_analyst WITH PASSWORD 'S3cur3P@ssw0rd' NOSYSADMIN DEFAULT_SCHEMA my_schema VALID UNTIL '2024-12-31';
5. Assigning the User to a Group
To manage permissions efficiently, add the new user to a group:
CREATE GROUP analysts; ALTER GROUP analysts ADD USER data_analyst;
6. Using nzsql
Command-Line Tool
You can also create a user interactively at the command-line:
bash nzsql -u admin_user -d system
Then run your
CREATE USERcommand at the
nzsqlprompt.
Note: Ensure passwords comply with your organization's password policy and Netezza's password requirements. Always use secure channels and best practices for user management.
Installing the JDBC driver
The IBM Netezza JDBC driver enables connectivity between Matillion Data Productivity Cloud and Netezza databases. Due to licensing or redistribution restrictions, the Netezza JDBC driver is not included by default in Matillion’s data integrations. You must download and install this driver manually before you can establish a connection. Follow the steps below to complete this process.
1. Download the Netezza JDBC Driver
- Go to the official IBM download page for the Netezza JDBC driver:
https://www.ibm.com/support/pages/ibm-netezza-jdbc-driver - On the download page, locate and obtain the Type 4 JDBC driver (this is the most suitable type for use with Matillion).
- You will typically need to accept IBM’s license agreement before proceeding with the download.
2. Locate the Driver File
After downloading, you should have a
.jarfile (for example,
nzjdbc3.jar). Make sure this file is stored securely and is readily accessible for the installation step.
3. Upload the JDBC Driver to Matillion
Since Matillion Data Productivity Cloud does not bundle the Netezza driver, you need to upload it manually. Detailed upload instructions are available here:
https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/
In summary:
- Login to the Matillion Data Productivity Cloud user interface.
- Navigate to the Agent environment where you want the driver to be available.
- Use the options provided to upload the
.jar
file as a new external JDBC driver.
Please refer to the link above for the step-by-step process and requirements (for example, permissions, or supported agent versions).
4. Configure and Use the Driver
Once the driver is installed, Matillion will recognize it in the list of available database connectors. Follow these instructions to create your first database query or connection, incorporating the newly added driver:
https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/
The documentation provides guidance on specifying connection parameters and integrating Netezza data sources using the Product Cloud Designer interface.
Following these steps, you will have access to the IBM Netezza database from your Matillion platform. Always ensure to consult official documentation for any recent changes or driver-specific instructions.
Checking network connectivity
To connect Matillion Data Productivity Cloud to your IBM Netezza database, you must ensure that the database accepts incoming network connections according to your deployment model:
-
Full SaaS agent configuration:
The firewall of your IBM Netezza database must allow inbound connections from the static IP addresses used by Matillion’s SaaS platform. You can find the list of required IP addresses at: https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/ -
Hybrid SaaS deployment:
In this configuration, Matillion connects from within your own environment. Therefore, you must allow incoming connections from your own virtual private cloud (VPC) network where the Matillion agent is running. To help verify network access in this scenario, you can use the connectivity utilities available at: https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/
Additionally, if your IBM Netezza database is referenced using a DNS name, the Full SaaS or Hybrid SaaS agent must be able to resolve the DNS address correctly. Make sure that any applicable DNS records are accessible from the relevant network.
Querying Data from an IBM Netezza Database
This guide provides instructions and examples on querying data from an IBM Netezza database, with considerations for ETL practices and data integration (e.g., migrating to Snowflake). The examples use standard SQL SELECT statements, demonstrate the handling of incremental versus initial loads, and touch on datatype conversions.
Example IBM Netezza Queries
Below are some typical sample queries you might run against a Netezza database.
``` -- Fetch all columns from a table SELECT * FROM sales.orders;
-- Select specific columns and apply a WHERE filter SELECT order_id, customer_id, order_date FROM sales.orders WHERE order_date >= '2024-01-01';
-- Aggregate query: total sales amount per customer SELECT customer_id, SUM(sale_amount) AS total_sales FROM sales.transactions GROUP BY customer_id;
-- Join between two tables SELECT o.order_id, c.customer_name, o.order_date FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.status = 'SHIPPED'; ```
Datatype Conversion: IBM Netezza vs. Snowflake
When moving data between IBM Netezza and Snowflake, some datatype conversion is often necessary. Common considerations include:
| IBM Netezza Datatype | Snowflake Datatype |
|---|---|
| INTEGER | NUMBER(38,0) |
| FLOAT/DOUBLE | FLOAT |
| VARCHAR(n) | VARCHAR(n) |
| CHAR(n) | CHAR(n) |
| BYTEINT | NUMBER(3,0) |
| DATE | DATE |
| TIMESTAMP | TIMESTAMP_NTZ/LTZ |
| BOOLEAN | BOOLEAN |
Note: Review your schema and test thoroughly to ensure datatype compatibility and preservation of precision.
Data Extraction Best Practice: Initial and Incremental Loads
The recommended load pattern is:
- One-time Initial Load: Extracts the entire dataset without filters.
- Incremental Loads: Extracts only new or changed records by applying a filter such as a modified timestamp or ID.
Use the same Database Query component/tool in both phases—simply adjust the SQL filter.
Initial Load: No Filter Clause
-- Load all rows (initial load) SELECT * FROM sales.transactions;
Incremental Load: With Filter Clause
-- Load only rows where 'last_updated' is newer than the last load time (incremental load) SELECT * FROM sales.transactions WHERE last_updated >= '2024-06-01 00:00:00';
Adjust the value '2024-06-01 00:00:00'
as needed to the timestamp from your last successful extraction.
For a detailed strategy including high-water mark management, see Matillion Incremental Load Strategy.
Tip: Always validate query results when switching between IBM Netezza and Snowflake to ensure accuracy after data type conversions and filter applications.
Data Integration Architecture
Loading data into Snowflake in advance of integration embodies the "divide and conquer" approach by breaking the overall process into two manageable steps. In the context of ELT (Extract, Load, Transform) architecture, this allows organizations to first load raw source data into the target environment, and then perform data integration using separate, optimized transformation steps. Effective data integration invariably requires transformation of the loaded data, and the most efficient way to achieve this within Snowflake is through data transformation pipelines, which can automate and orchestrate the various transformation tasks. A key advantage of the ELT approach is that all data transformations and integrations are performed inside the Snowflake database itself. This means the process is fast, scalable, and available on demand, without the need to invest in or maintain separate data processing infrastructure, ultimately streamlining operations and reducing costs.