Extracting data from HP NonStop SQL/MX to Amazon Redshift
Extracting data from HP NonStop SQL/MX is a critical step for organizations seeking to modernize analytics and leverage the scalability of Amazon Redshift. This article provides a concise, step-by-step guide to successfully move your data from HP NonStop SQL/MX into Redshift, ensuring data integrity and minimal downtime. We will begin by showing you how to create an appropriate identity within HP NonStop SQL/MX, enabling secure and structured access to your source data. For those using Matillion ETL, we will outline the prerequisites for obtaining and configuring the necessary JDBC driver to facilitate the connection. Next, we will discuss best practices for establishing network connectivity between your NonStop SQL/MX environment and Redshift, addressing potential security and latency considerations. Finally, the article will guide you through querying and extracting your data, both for the initial load and for setting up efficient incremental updates. Whether you are undertaking a one-time migration or planning ongoing synchronization between HP NonStop SQL/MX and Redshift, this roadmap will help you get started with confidence.
What is HP NonStop SQL/MX?
HP NonStop SQL/MX is a high-performance, distributed relational database management system engineered for the HPE NonStop server platform, renowned for its unmatched availability and scalability in mission-critical environments. SQL/MX supports ANSI SQL standards and offers advanced features such as dynamic query optimization, parallel execution, and native Java and ODBC/JDBC interfaces. It is designed to handle large-scale OLTP (Online Transaction Processing) and analytical workloads with robust fault tolerance, leveraging NonStop’s shared-nothing architecture to ensure continuous operation even in the event of component failures. With comprehensive security controls, automatic data distribution, and strong workload management capabilities, NonStop SQL/MX is ideally suited for industries demanding 24x7 operation, such as banking, telecommunications, and retail.
What is Amazon Redshift?
Amazon Redshift is a fully managed, petabyte-scale data warehouse built on modified PostgreSQL for high performance and scalability in the cloud. Leveraging columnar storage, data compression, and massively parallel processing (MPP), it efficiently stores and queries large datasets. Redshift integrates with AWS services like S3 and supports Redshift Spectrum for direct queries on S3 data without import. Security features include VPC, encryption at rest/in transit, and IAM-based access control to protect data. Its robust architecture and seamless integrations make Redshift ideal for analytics, business intelligence, and reporting workloads in modern cloud environments.
Why Move Data from HP NonStop SQL/MX into Amazon Redshift
The Case for Replicating Data from HP NonStop SQL/MX to Amazon Redshift for Advanced Analytics
Data engineers and architects may seek to copy data from HP NonStop SQL/MX into Amazon Redshift for several compelling reasons. First, HP NonStop SQL/MX often houses business-critical transactional data that holds significant potential value for analytics and business intelligence initiatives. However, to fully realize this value, it is often desirable to integrate data from HP NonStop SQL/MX with information from other enterprise data sources, enabling comprehensive, cross-functional insights that cannot be achieved by analyzing each source in isolation. Leveraging Redshift as the target platform for such integration not only provides a powerful, scalable environment optimized for complex analytics, but also serves to offload analytical and reporting workloads from the HP NonStop SQL/MX system, thereby preserving its performance for its primary operational functions and minimizing potential disruptions to mission-critical processes.
Similar connectors
Creating an Identity in HP NonStop SQL/MX Database
This guide provides step-by-step instructions for creating a user (identity) in an HP NonStop SQL/MX database. In NonStop SQL/MX, users are managed as system-level identities rather than as database users in the traditional sense. Access control within the database is handled via authorization IDs.
Prerequisites
- You must have sufficient privileges, typically
DB__ADMIN
or have been granted theCREATE USER
right. - Ensure you are connected to the SQL/MX environment and have access via a recognized command interface, such as SQLCI or MXCS.
Step 1: Create an Identity (User)
In SQL/MX, use the
CREATE USERstatement to create a new identity.
CREATE USER myapp_user
IDENTIFIED BY 'StrongPassword123'
[DEFAULT_SCHEMA my_schema]
[DEFAULT_ROLE my_role];
- Replace
myapp_user
with the desired user name. - Replace
StrongPassword123
with a secure password. - The optional
DEFAULT_SCHEMA
clause specifies the default schema for the user session. - Optionally, set a
DEFAULT_ROLE
to automatically enable it upon connection.
Example: Creating a Basic User
CREATE USER reports_user IDENTIFIED BY 'Summer2024!';
Example: Creating a User with a Default Schema
CREATE USER app_user IDENTIFIED BY 'Password123!' DEFAULT_SCHEMA prod_schema;
Step 2: Granting Privileges
New users have no database privileges by default. Assign the appropriate database rights using the
GRANTstatement:
GRANT SELECT, INSERT, UPDATE ON prod_schema.sales_data TO app_user;
Step 3: Verifying the New User
To verify that the user has been created, you can query the system catalog view:
SELECT * FROM "_SYSTEM"."USERS" WHERE USER_NAME = 'APP_USER';
Notes
- Password complexity and expiration policies are enforced per system configuration.
- If you integrate with Guardian, ensure system-level user IDs match as needed.
- For enterprise authentication, consider integration with LDAP or other security services if configured.
For more advanced options and integration with NonStop security, consult the official HPE NonStop SQL/MX Database Management Manual.
Tip: Always adhere to your organization's security policies when creating and managing users and passwords.
Installing the JDBC Driver
The HP NonStop SQL/MX JDBC driver, required for connectivity from Matillion Data Productivity Cloud, is not included by default with the product due to licensing and redistribution constraints. Therefore, customers must obtain the driver directly from the vendor. The following steps outline how to acquire, upload, and configure the driver for use with Matillion Data Productivity Cloud.
1. Download the HP NonStop SQL/MX JDBC Driver
To acquire the driver:
- Visit the HP support site at https://support.hpe.com/.
- Authenticate or register for an account if prompted.
- Search for the “HP NonStop SQL/MX JDBC driver.”
- When multiple versions are available, select a Type 4 JDBC driver (a pure Java driver) for highest compatibility and ease of deployment.
- Download the JDBC driver
.jar
file to your local machine, ensuring you note the exact file location.
Note: Due to vendor restrictions, Matillion cannot redistribute this driver and it must be procured independently.
2. Install the JDBC Driver in Matillion Data Productivity Cloud
The Matillion Data Productivity Cloud allows administrators to upload external JDBC drivers to its agent environment. To do this:
- Review the official installation guide for external drivers: Uploading external drivers.
- Follow the documented procedure for uploading:
- Access the agent’s administration interface.
- Navigate to the section for managing JDBC drivers or "External Drivers."
- Use the interface to upload the previously downloaded Type 4 JDBC driver .jar file for HP NonStop SQL/MX.
- Confirm the upload was successful and the driver is listed accordingly.
- Restart or reload the agent environment as necessary to activate the new driver.
3. Configure Use of the Driver Within Matillion
After uploading the driver:
- Reference the Database Query documentation for instructions on how to create a connection within Matillion workflows using the newly installed HP NonStop SQL/MX JDBC driver.
- During the connection configuration, select the uploaded driver and supply the necessary connection parameters for your NonStop SQL/MX environment.
By following these steps, you will have the HP NonStop SQL/MX JDBC driver ready for use within Matillion Data Productivity Cloud.
Checking network connectivity
To ensure successful connectivity between the Matillion Data Productivity Cloud and your HP NonStop SQL/MX database, you must verify that the database allows incoming network connections. This requirement varies depending on your deployment configuration:
-
Full SaaS Agent Configuration:
Allow incoming connections from the IP addresses used by Matillion Data Productivity Cloud. The required IP address ranges are published and regularly updated at:
https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/
You need to update your firewalls or network access controls to permit access from these addresses. -
Hybrid SaaS Deployment:
Allow incoming connections from your own virtual private cloud (VPC), where the Matillion agent is hosted. To assist with network testing and configuration, you can use the utilities available at:
https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/
Note:
If you are referencing the HP NonStop SQL/MX database using a DNS name (rather than a direct IP address), ensure that the Full SaaS or Hybrid SaaS agent is able to resolve the hostname. Proper DNS resolution is essential for the agent to initiate a successful database connection.
Querying Data from HP NonStop SQL/MX Database
This guide explains how to query data from an HP NonStop SQL/MX database, particularly in the context of ETL or data replication to systems such as Amazon Redshift. It provides SQL examples and best practices for loading data incrementally or in full loads using the Database Query component.
Example Queries: HP NonStop SQL/MX Syntax
Below are sample SQL SELECT statements for querying data from an HP NonStop SQL/MX database:
``` -- Select all rows from a table SELECT * FROM SALES.TRANSACTIONS;
-- Select specific columns with a filter SELECT CUSTOMER_ID, AMOUNT, TRANSACTION_DATE FROM SALES.TRANSACTIONS WHERE TRANSACTION_DATE >= DATE '2024-01-01';
-- Join example SELECT c.CUSTOMER_NAME, t.AMOUNT FROM SALES.CUSTOMERS c JOIN SALES.TRANSACTIONS t ON c.CUSTOMER_ID = t.CUSTOMER_ID WHERE t.AMOUNT > 1000; ```
Datatype Conversion Implications
When querying data in this context, keep in mind HP NonStop SQL/MX and Amazon Redshift have different data types. For example:
VARCHAR
(SQL/MX) maps toVARCHAR
(Redshift), but maximum lengths may differ.DATE
andTIMESTAMP
types typically convert directly. In cases with unsupported precision, you may need to cast or truncate.- Numeric type precision and scale (e.g.,
DECIMAL
,INTEGER
) should be reviewed between systems.
Tip: If type mapping issues arise, use SQL/MX casting functions (e.g.,
CAST(amount AS DECIMAL(10,2))) within your SELECT statements.
Best Practice: Initial & Incremental Loads with Database Query Component
The recommended pattern for ongoing data extraction is:
- Initial Load: Extract all data (no filter).
- Incremental Loads: Extract only new or changed data using filters.
The same Database Query component can be used for both; only the SQL query differs.
Initial Load Example
No filter clause: extracts all records.
SELECT CUSTOMER_ID, AMOUNT, TRANSACTION_DATE FROM SALES.TRANSACTIONS;
Incremental Load Example
Use a filter that selects only data loaded since the last successful extraction (for example, using the value of the highest
TRANSACTION_DATEpreviously loaded):
SELECT CUSTOMER_ID, AMOUNT, TRANSACTION_DATE FROM SALES.TRANSACTIONS WHERE TRANSACTION_DATE > DATE '2024-06-01';(Replace
'2024-06-01'with your last loaded value, typically passed as a parameter.)
For more guidance on incremental data replication strategies and usage patterns, refer to the Matillion Exchange article.
Data Integration Architecture
Loading data in advance of integration is a core advantage of the ELT (Extract, Load, Transform) architecture, as it enables you to divide and conquer the data integration process by splitting it into two separate steps. With this approach, raw source data is first loaded directly into the Redshift database, postponing complex transformation and integration logic until after loading. Data integration inherently requires data transformation, and the most effective way to manage this complexity is by implementing data transformation pipelines. These pipelines structure and orchestrate the sequencing of transformations required to prepare data for analytics. Another significant advantage of the ELT architecture is that all integration and transformation tasks are performed natively inside the target Redshift environment, rather than on external platforms. This design delivers fast, on-demand, and highly scalable data processing, while eliminating the need for additional, potentially costly, third-party processing infrastructure.