Integrate data from NuoDB to Amazon Redshift using Matillion

Our NuoDB to Redshift connector enables seamless and efficient data transfer to Redshift within minutes, keeping your information current without the need for manual coding or managing intricate ETL scripts.

NuoDB
Amazon Redshift
NuoDB to Amazon Redshift banner

Extracting data from NuoDB to Amazon Redshift

Extracting data from NuoDB is a critical process for organizations seeking to leverage the analytical power and scalability of Amazon Redshift. Successful data migration enables more advanced reporting, seamless integration with business intelligence tools, and improved decision-making. However, the process involves several essential steps to ensure a secure, reliable, and efficient transfer of data between the two systems. In this article, we will guide you through the key stages of moving data from NuoDB to Redshift. We will begin by creating an appropriate identity in NuoDB, laying a secure foundation for data access. For users of Matillion, we’ll explain how to check for the required JDBC driver, or how to acquire it if it is not yet installed. Next, we will discuss best practices for establishing network connectivity between NuoDB (your source database) and Amazon Redshift (your target data warehouse), highlighting important considerations for security and performance. Finally, we will outline how to query data—both for comprehensive initial loads and for ongoing incremental updates—to keep your Redshift environment in sync with data changes in NuoDB. By following this step-by-step approach, you can streamline your data extraction and migration process, paving the way for effective downstream analytics in Redshift.


What is NuoDB?

NuoDB is a distributed, cloud-native SQL database designed to deliver high availability, horizontal scalability, and transactional consistency for modern applications. It employs a multi-layered architecture separating transactional and storage processes, allowing for elastic scaling without downtime or performance degradation. Unlike traditional relational databases constrained by single-node architectures, NuoDB leverages peer-to-peer communication and a shared-nothing model, supporting both on-premise and hybrid cloud deployments. Its support for ACID transactions, standard SQL, and multi-version concurrency control (MVCC) enables robust data integrity while maintaining low latency and high throughput in dynamically changing environments. This makes NuoDB particularly well-suited for organizations seeking to modernize their data infrastructure with cloud agility without sacrificing transactional reliability.

matillion logo x NuoDB

What is Amazon Redshift?

Amazon Redshift is a managed, cloud-based data warehouse built on PostgreSQL for fast, large-scale analytics. Using columnar storage, data compression, and Massively Parallel Processing (MPP), Redshift efficiently handles complex queries across massive datasets. It easily integrates with the AWS ecosystem and supports diverse data loading, transformation, and visualization tools. Features like automatic vacuuming, workload management, and concurrency scaling optimize query performance and scalability. These capabilities make Redshift a popular, reliable, and cost-effective choice for enterprises seeking high-performance cloud data warehousing solutions.

Why Move Data from NuoDB into Amazon Redshift

Leveraging Data Copy from NuoDB to Redshift for Enhanced Analytics

A data engineer or architect may wish to copy data from NuoDB into Amazon Redshift for several strategic reasons. NuoDB often contains data that is timely, relevant, and potentially valuable for business intelligence, analytics, or reporting purposes. By consolidating this data with information from other sources within Redshift, organizations can unlock richer insights and create a more comprehensive view of their business activity. Importantly, leveraging Redshift for data integration and analytics ensures that complex queries and analytical workloads do not impose additional strain on the NuoDB transactional system, thereby maintaining its performance and availability for primary operations while still enabling advanced data-driven decision-making.

Creating an Identity in NuoDB

To create a new user (identity) in a NuoDB database, you interact with the system using SQL statements. NuoDB manages authentication and user properties via the SQL standard

CREATE USER
and related commands.

Prerequisites

  • Ensure you have administrator privileges, as only database administrators (ADMIN authority) can create or alter users.
  • Connect to your NuoDB database using the
    nuosql
    CLI or any compatible SQL client.

Step-by-Step Instructions

1. Connect to the Database

nuosql <database_name> --user DBA --password <admin_password>

Replace

<database_name>
and
<admin_password>
with your actual database name and admin password.

2. Create a User

Use the following SQL statement to create a new user. Substitute

<username>
and
<password>
with your desired username and password.

CREATE USER <username> IDENTIFIED BY '<password>';

Example:

CREATE USER analyst1 IDENTIFIED BY 'securePW1!';

3. Grant Privileges (Optional)

New users do not have any privileges by default. To allow the user to access or modify data, you must explicitly grant them permissions. For example, to allow the user to connect to the database and perform SELECT queries:

GRANT CONNECT TO analyst1;
GRANT SELECT ON SCHEMA <schema_name> TO analyst1;

Replace

<schema_name>
with the actual schema you wish to grant access to.

4. Verify User Creation

You can confirm the user has been created using:

SELECT * FROM system.users;

This statement retrieves a list of all users in the system.


For further details on user and privilege management, refer to the official NuoDB SQL documentation.

Installing the JDBC Driver

To enable connectivity to NuoDB databases from Matillion Data Productivity Cloud, you will need to manually download and install the appropriate JDBC driver. Due to licensing or redistribution restrictions, the NuoDB JDBC driver does not ship with Matillion Data Productivity Cloud and must be acquired separately.

Step 1: Download the NuoDB JDBC Driver

  1. Go to the NuoDB official downloads page: https://www.nuodb.com/downloads
  2. Locate the JDBC driver section. When presented with multiple options, look specifically for the Type 4 JDBC driver, which is typically recommended for most JDBC-compatible applications, including Matillion Data Productivity Cloud.
  3. Download the appropriate JDBC driver package for your operating system and Java version. Note where you save this file, as you will need it in upcoming steps.

Step 2: Install the JDBC Driver in Matillion Data Productivity Cloud

Since Matillion Data Productivity Cloud installs with certain bundled database drivers, external drivers like the NuoDB JDBC driver must be uploaded manually.

  1. Review and follow the official installation guidance at:
    https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/
  2. Prepare to upload the downloaded NuoDB JDBC driver to all relevant Matillion Agents or instance locations as instructed.
  3. Complete the installation by placing the driver file(s) into the appropriate directory, as detailed in the Matillion documentation linked above.
  4. Restart or reconfigure your Matillion environment/agent if required, so that the application can recognize and utilize the newly uploaded driver.

Step 3: Configure and Use the Driver in Matillion

With the driver installed, you can now begin creating connections and running components that interact with your NuoDB database.

  • Follow the usage instructions provided here for setting up Database Query components using your uploaded driver:
    https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/

Refer to these instructions to configure your connections and credentials, specifying the correct JDBC driver that you have just installed.

Note: Ensure that your use of the JDBC driver complies with the NuoDB licensing requirements, as redistribution is restricted and you are responsible for adhering to their terms.

Checking network connectivity

Before connecting Matillion Data Productivity Cloud to your NuoDB database, you must ensure that the NuoDB database allows incoming connections from the appropriate sources, depending on your deployment configuration:

Additionally, if your NuoDB database is referenced by a DNS hostname rather than a static IP address, ensure that the Matillion agent (whether Full SaaS or Hybrid SaaS) is able to resolve the database’s DNS name. Proper DNS resolution is required for the agent to connect successfully.

Querying Data from a NuoDB Database

This guide explains how to query data from a NuoDB database, with code examples and best practices for initial and incremental data loads, especially in ETL scenarios that leverage tools like Matillion. For more information, see Matillion's article on incremental load data replication strategy.


1. Example: SQL SELECT Statements in NuoDB

Basic Table Query

SELECT * FROM customers;

Conditional Query

SELECT customer_id, customer_name, email
FROM customers
WHERE active = TRUE;

Query with Sorting and Limiting

SELECT order_id, order_date, amount
FROM orders
WHERE order_status = 'SHIPPED'
ORDER BY order_date DESC
LIMIT 100;


2. Datatype Conversion: NuoDB to Redshift

When migrating or querying data across NuoDB and Amazon Redshift, be aware of possible datatype conversions:

NuoDB Type Common Redshift Equivalent
INTEGER INTEGER
DOUBLE DOUBLE PRECISION
VARCHAR(n) VARCHAR(n)
TIMESTAMP TIMESTAMP
BLOB VARBYTE
BOOLEAN BOOLEAN

Make sure to review column types to avoid schema or load errors during data transfer.


3. Patterns for Initial and Incremental Loads

The reliable pattern for loading data is a once-off initial load followed by incremental loads, using the same ETL component for querying, such as Matillion's Database Query component.

Initial Load: No Filter Clause

For the initial load, select all relevant data:

-- No filter; retrieves all records
SELECT * FROM customers;

Incremental Load: With Filter Clause

For subsequent periodic (incremental) loads, restrict your query to only new or modified data. This is typically achieved by filtering rows based on an "updated_at" or "modified_date" column:

-- Only new/updated records since the last successful load
SELECT *
FROM customers
WHERE updated_at > '2024-06-01 00:00:00';

Tip: Replace the filter value with a variable or the latest timestamp from your destination system to ensure only fresh data is pulled.

Reference: Read more detail on Incremental Load Data Replication Strategy.


4. ETL Component Usage

  • Use the same Database Query component in both the initial and incremental load workflows.
  • For the initial load, omit the
    WHERE
    clause.
  • For incremental loads, add an appropriate
    WHERE
    clause based on your business logic (e.g.,
    WHERE updated_at > ?
    ).

Note: Adjust your ETL orchestration to track and manage high-water marks, ensuring data consistency across loads.

Data Integration Architecture

Loading data into Redshift in advance of integration is a classic example of the "divide and conquer" strategy, where the process is split into two distinct steps: first, extracting and loading the raw data, and second, performing integration and transformation. This modular approach is a key advantage of the Extract, Load, Transform (ELT) architecture, as it allows for greater flexibility and efficiency in pipeline management. Since data integration inherently requires transforming the loaded data to fit business requirements or analytical models, it is best handled using robust data transformation pipelines. With ELT, these transformation and integration processes take place directly inside the target Redshift database. This not only provides rapid, on-demand performance and horizontal scalability, but also eliminates the need for additional data processing infrastructure—helping organizations reduce both operational complexity and cost.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.