The Agentic Advantage series: Real-world AI success stories from leading CDAOs

Register Now

Integrate data from NuoDB to Databricks using Matillion

The NuoDB to Databricks connector seamlessly transfers your data to Databricks within minutes, ensuring it remains current without requiring manual coding or complex ETL scripting.

NuoDB
Databricks
NuoDB to Databricks banner

Extracting data from NuoDB to Databricks

Extracting data from NuoDB is a valuable step in enabling advanced analytics, reporting, or integration with modern platforms such as Databricks. Efficient data extraction and loading workflows can unlock the full potential of your organizational data by bridging traditional RDBMS systems with cloud-based analytics environments. In this article, we will guide you through a systematic approach to migrating or synchronizing data from NuoDB to Databricks. We begin by covering the creation of an identity in NuoDB to manage secure and trackable data access. Next, we outline the requirements for connecting via JDBC, with special attention to users leveraging Matillion for orchestration—making sure the correct JDBC driver is in place. Network connectivity considerations between the source (NuoDB) and the target (Databricks or your ingestion layer) are then discussed to help you avoid common pitfalls. Finally, we examine best practices for querying your data, including strategies for both full initial extracts and incremental data loading. Whether you are building a one-off migration pipeline or establishing a recurring data integration workflow, this article will provide clear, actionable guidance for each critical step in the process.


What is NuoDB?

NuoDB is a distributed, cloud-native relational database designed to deliver SQL compliance alongside elastic scalability for modern applications. It utilizes a multi-node, peer-to-peer architecture that separates transactional processing (transaction engines) from storage management (storage managers), enabling seamless scaling both horizontally and vertically without sacrificing consistency or performance. NuoDB employs a durable, ACID-compliant approach, and supports continuous availability and online schema changes, making it well-suited for mission-critical workloads in dynamic cloud or hybrid environments. Its support for automation, containerization, and integration with orchestration platforms like Kubernetes further aligns NuoDB with contemporary DevOps and microservices architectures.

matillion logo x NuoDB

What is Databricks?

The Databricks database is an integral component of the Databricks Lakehouse Platform, designed to unify data warehousing and AI capabilities on a single, cloud-based architecture. Built on top of Apache Spark, it provides robust, scalable storage and querying functionality using Delta Lake, an open-source storage layer that ensures ACID compliance and reliable data versioning. Databricks supports both structured and semi-structured data, enabling efficient ETL workflows, interactive analytics, and machine learning operations at scale. Through features like managed tables, time travel, and seamless integration with popular data formats, the Databricks database simplifies the management of large datasets while maintaining high performance and data governance standards.

Why Move Data from NuoDB into Databricks

The Value of Transferring Data from NuoDB to Databricks for Advanced Analytics

A data engineer or architect may wish to copy data from NuoDB into Databricks for several compelling reasons. Firstly, the data residing in NuoDB often holds significant value but may be underutilized when kept in isolation. By integrating this data with information from multiple other sources in Databricks, organizations can derive deeper insights, unlock new analytical capabilities, and support more comprehensive decision-making processes. Furthermore, performing complex data processing or analytics tasks natively within NuoDB could impose additional workload and potentially impact transactional performance. By offloading these workloads to Databricks, which is specifically designed for scalable analytics and big data processing, teams ensure that NuoDB remains dedicated to its primary transactional responsibilities while still enabling value extraction from its data in a high-performance, cost-effective environment.

Creating a User in NuoDB

To provide user-level access control in NuoDB, you can create database users (also called identities). Users are typically managed using SQL commands through the NuoDB SQL client (

nuosql
) or any SQL interface connected to your NuoDB database.

Below are step-by-step instructions for creating a user in NuoDB.


Prerequisites

  • Ensure you have administrative privileges (for example, connecting as user
    dba
    ).
  • You have access to the
    nuosql
    command line or an equivalent SQL interface.

1. Connect to the Database

Use the following command to start a SQL session. Replace

<db-name>
,
<db-host>
, and
<db-port>
as required:

nuosql <db-name>@<db-host>:<db-port> -u dba -p <admin-password>

2. Create a User

NuoDB manages login identities with the

CREATE USER
SQL statement. The basic syntax is:

CREATE USER <username> IDENTIFIED BY '<password>';

Example: To create a user named

appuser
with password
Complex!Passw0rd
:

CREATE USER appuser IDENTIFIED BY 'Complex!Passw0rd';

3. Grant Privileges

After creating the user, grant appropriate privileges so they can operate within the database (e.g.,

CONNECT
and
SELECT
).

Example: To allow the user to connect and select from all tables:

GRANT CONNECT TO appuser;
GRANT SELECT ON ALL TABLES TO appuser;

Note: Adjust privileges per your security policy and requirements.


4. Verify User Creation

You can verify the user by querying the

SYSTEM.USERS
system table:

SELECT * FROM SYSTEM.USERS;

This will list all users currently defined in the database.


5. Test User Login (Optional)

To ensure the new identity works, try connecting as the newly created user:

nuosql <db-name>@<db-host>:<db-port> -u appuser -p Complex!Passw0rd


References:

Installing the JDBC Driver

At the time of writing, the JDBC driver for the NuoDB database is not included by default in Matillion Data Productivity Cloud. This omission is due to the licensing and redistribution restrictions associated with the NuoDB JDBC driver. As a result, installing the necessary driver is a manual process.

Downloading the NuoDB JDBC Driver

  1. Access the NuoDB Downloads Portal:
    Navigate to https://www.nuodb.com/downloads.

  2. Locate the JDBC driver:
    On the downloads page, look for the JDBC driver—specifically, choose the Type 4 JDBC driver if multiple options are presented. The Type 4 driver is a pure Java implementation that does not require native libraries and is suitable for cloud-based integration scenarios like Matillion Data Productivity Cloud.

  3. Download the driver:
    Follow the licensing steps as required by NuoDB and download the

    .jar
    file for the JDBC driver to your local machine.

Installing the Driver in Matillion Data Productivity Cloud

With the JDBC driver

.jar
file downloaded, you will need to upload it into your Matillion Data Productivity Cloud environment. To do this, refer to the official Matillion documentation on external driver installation:

This guide provides detailed steps for:

  • Accessing the Matillion Agent.
  • Uploading the
    .jar
    file via the user interface.
  • Restarting services or validating the installation, if required.

Be sure to review any additional prerequisites or compatibility notes in the linked instructions, as these may vary depending on your deployment configuration and version of Matillion Data Productivity Cloud.

Additional Usage Guidance

Once the NuoDB JDBC driver has been successfully uploaded and installed, refer to the following documentation for configuring and using the NuoDB connection in your data workflows:

This page offers step-by-step usage instructions, including how to specify connection details for your NuoDB instance, and how to utilize the driver within data pipelines in Matillion Data Productivity Cloud.

By following these instructions, you can extend your Matillion Data Productivity Cloud environment to connect with NuoDB databases using the JDBC protocol, subject to licensing compliance and any organizational security policies.

Checking network connectivity

To ensure successful connectivity between the Matillion Data Productivity Cloud and your NuoDB database, you must confirm that the NuoDB database allows incoming connections from the appropriate sources, depending on your deployment type:

In addition, if your NuoDB database is referenced using a DNS hostname (rather than a direct IP address), your Matillion Full SaaS or Hybrid SaaS agent must be able to resolve the given hostname to the correct IP address. Ensure there are no DNS resolution issues that would prevent successful connections.

Querying Data from a NuoDB Database

This guide explains how to query data from a NuoDB database, provides SQL SELECT statement examples, and outlines loading strategies—highlighting data type considerations when working with platforms like Databricks.


Example: NuoDB SQL SELECT Statements

To query data from a NuoDB database, you can use standard SQL SELECT statements. Here are some common examples:

``` -- Select all rows from a table SELECT * FROM customers;

-- Select specific columns and filter with a WHERE clause SELECT customer_id, name, email FROM customers WHERE status = 'active';

-- Aggregate data with GROUP BY SELECT status, COUNT(*) AS total FROM orders GROUP BY status;

-- Join two tables SELECT o.order_id, c.name FROM orders o JOIN customers c ON o.customer_id = c.customer_id; ```

Datatype Conversion Considerations

When transferring data between NuoDB and other platforms such as Databricks, datatype conversion may occur. Not all data types map directly; for example:

NuoDB Datatype Possible Databricks Equivalent
STRING STRING
INTEGER INT
FLOAT/DOUBLE FLOAT/DOUBLE
TIMESTAMP TIMESTAMP
BOOLEAN BOOLEAN

Be aware of possible precision or format differences, especially for date, timestamp, and numeric types.

Loading Patterns: Initial and Incremental Loads

The best practice for populating your target system (e.g., Databricks) from NuoDB is:

  1. Once-off Initial Load: Full extract of existing data.
  2. Incremental Loads: Load only new or changed records, improving efficiency and reducing duplicates.

Both load types use the same Database Query component (e.g., Matillion's Database Query), but are differentiated by their SQL filter clauses.

Initial Load Example

The initial load fetches all data from the source table, so no

WHERE
clause is applied:

SELECT * FROM customers;

Incremental Load Example

For ongoing data loads, you should include a filter, such as a timestamp or an incrementing ID, to extract only new or updated rows. For example:

SELECT * 
FROM customers 
WHERE last_updated > '${last_loaded_timestamp}';

  • last_updated
    should be a column in your table indicating change time.
  • ${last_loaded_timestamp}
    is typically a parameter set by your ETL tool to remember the last successfully loaded value.

For an in-depth guide, see Matillion's Incremental Load Data Replication Strategy.


Note: Ensure the filter condition for incremental loads correctly matches your data's semantic for changes. Both initial and incremental loads should utilize the same database connector/component for simplicity and consistency.

Data Integration Architecture

Loading data in advance of integration enables organizations to divide and conquer the data integration challenge by splitting the process into two distinct steps: first, ingestion (loading data) and second, transformation and integration. This division is a key advantage of the Extract, Load, and Transform (ELT) architecture, as it enables raw data to be quickly loaded into the Databricks database before any transformation is performed. Data integration inherently requires data transformation, and the most effective way to facilitate this is by implementing robust data transformation pipelines. These pipelines systematically prepare and combine data into formats suitable for analytics or downstream operations. Another significant benefit of the ELT approach is that all transformation and integration steps occur directly within the Databricks environment. This on-platform processing is fast, supports on-demand scaling, and eliminates the need for additional, external data processing infrastructure, thereby reducing overall costs and simplifying operational management.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.