The Agentic Advantage series: Real-world AI success stories from leading CDAOs

Register Now

Integrate data from H2 Database to Amazon Redshift using Matillion

Our H2 Database to Redshift connector enables seamless data transfer to Redshift within minutes, ensuring your information remains current without the need for manual coding or managing intricate ETL scripts.

H2 Database
Amazon Redshift
H2 Database to Amazon Redshift banner

Extracting data from H2 Database to Amazon Redshift

Extracting data from H2 Database is a process that enables organizations to migrate or synchronize datasets with other platforms, such as Amazon Redshift, for advanced analytics and reporting. In modern data environments, ensuring seamless data transfer between operational stores and data warehouses is critical for maintaining business intelligence workflows and driving strategic decision making. This article will guide you through the essential steps required for successful H2 Database to Redshift extraction and loading. We will start by outlining how to create an appropriate identity within H2 Database to manage permissions securely. Special attention will be given to users of Matillion, a popular ETL tool, with instructions on checking for or acquiring the required JDBC driver for H2 connectivity. Ensuring proper network connectivity between the source (H2 Database) and the Redshift target will also be discussed to prevent avoidable integration obstacles. Finally, we will demonstrate how to query data for extraction, and how to transition from an initial full load to ongoing incremental loads for efficient, up-to-date data transfer. By following these steps, organizations can streamline the extraction and loading process, minimizing manual effort while maximizing data integrity and performance.


What is H2 Database?

H2 Database is a high-performance, open-source relational database management system written in Java. It is well-suited for Java-based applications due to its lightweight architecture and seamless integration, functioning in both embedded and server modes. H2 supports standard SQL, multi-version concurrency control (MVCC), and advanced features such as in-memory databases, encryption, and clustering. Its small footprint and JDBC API compatibility make it an ideal choice for unit testing, rapid prototyping, and applications requiring a fast, in-process database solution. The database's active development and comprehensive documentation further contribute to its popularity within the Java ecosystem.

matillion logo x H2 Database

What is Amazon Redshift?

Amazon Redshift is a fully managed, petabyte-scale data warehouse service designed for online analytic processing (OLAP) and optimized for complex, high-performance queries on structured data. Built on a modified version of PostgreSQL, Redshift utilizes columnar storage, data compression, and massively parallel processing (MPP) to enable rapid querying and efficient storage. It integrates seamlessly with a wide range of data sources and ecosystem tools within AWS, supporting features such as automatic scaling, workload management, and spectrum for querying data directly in S3. Its pay-as-you-go pricing model and managed infrastructure allow organizations to focus on analytics without the overhead of provisioning and maintaining hardware or database software.

Why Move Data from H2 Database into Amazon Redshift

Unlocking Advanced Analytics: Copying Data from H2 Database to Amazon Redshift

A data engineer or architect may find it beneficial to copy data from an H2 Database into Amazon Redshift for several compelling reasons. Firstly, the H2 Database may contain operational data that holds significant potential value for analytics and business intelligence. To fully realize this value, it is often necessary to integrate the information stored in H2 with data from other sources, thereby enabling richer analysis, improved reporting, and more informed decision-making. By transferring the data to Redshift, a scalable cloud data warehouse, organizations can perform complex queries and integrations without imposing additional workload on the H2 Database, which might otherwise degrade performance or disrupt transactional operations. This approach ensures that H2 can continue to serve its primary function while Redshift handles the demands of data integration and analysis at scale.

Creating a User in H2 Database

In H2 Database, you can create and manage database users using standard SQL statements. Users are defined at the database level and can be assigned specific privileges.

Prerequisites

  • You must have administrative privileges (typically the
    SA
    user) to create new users.
  • You should be connected to your H2 database using a management tool (e.g., H2 Console, JDBC client, or command-line).

Creating a User

To create a user in H2, use the

CREATE USER
statement with the desired username and password. For example, to create a user named
devuser
with the password
MySecretPass
, execute the following SQL:

CREATE USER devuser PASSWORD 'MySecretPass';

Replace

devuser
and
MySecretPass
with your preferred username and a strong password.

Granting Privileges

By default, a new user has no privileges. To enable the user to perform specific actions, you must grant the required permissions. For example, to grant all privileges to the new user:

GRANT ALL ON SCHEMA PUBLIC TO devuser;

Or, to grant minimal privileges, such as

SELECT
rights:

GRANT SELECT ON SCHEMA PUBLIC TO devuser;

Example Workflow

Here is a combined script that creates a user and grants full schema privileges:

``` -- Create the user CREATE USER devuser PASSWORD 'MySecretPass';

-- Grant access to all tables in the PUBLIC schema GRANT ALL ON SCHEMA PUBLIC TO devuser; ```

Note: User and privilege management applies per database schema. Adjust the schema name if you are using a schema other than

PUBLIC
.

Listing Existing Users

To view all users in the H2 database:

SELECT * FROM INFORMATION_SCHEMA.USERS;

This will list usernames and their administrative status.


For further details, refer to the official H2 documentation on user management.

Installing the JDBC Driver

The H2 Database JDBC driver is not included by default in Matillion Data Productivity Cloud due to licensing or redistribution restrictions. To connect Matillion to an H2 database, you must manually download and install the necessary JDBC driver (preferably a Type 4 JDBC driver).

Follow the steps below to download, install, and configure the H2 JDBC driver for use with Matillion Data Productivity Cloud.


1. Download the H2 JDBC Driver

  1. Visit the official H2 Database download page: https://www.h2database.com/html/download.html.
  2. Identify and download the Type 4 JDBC driver, which is a platform-independent, pure Java driver suitable for most integration scenarios. The driver is typically packaged as a JAR file (e.g.,
    h2-*.jar
    ).

2. Install the Driver in Matillion Data Productivity Cloud

Once you have downloaded the JDBC driver JAR, you'll need to upload it to your Matillion environment. Installation details are documented in Matillion's official guide: Uploading External JDBC Drivers. Follow the instructions there to upload the JAR file and make it available to your Matillion instance.

3. Connect to H2 Database from Matillion

After installation, you can configure and use the driver in Matillion for database connectivity and queries. Refer to the official documentation for usage instructions:
Database Query Component Usage.

Be sure to select the newly installed driver when setting up the connection, then proceed with your data integration tasks as needed.

Checking network connectivity

To successfully connect Matillion Data Productivity Cloud to your H2 Database database, you must ensure that the H2 Database is configured to accept incoming network connections from the correct sources, depending on your deployment mode:

If your H2 Database server is referenced by a DNS hostname instead of a direct IP address, ensure that the Full SaaS or Hybrid SaaS agent is able to resolve the DNS name to an accessible IP address. If DNS resolution fails, network connectivity between Matillion Data Productivity Cloud and your H2 Database will not be established.

Querying Data from a H2 Database

This guide explains how to query data from an H2 Database, with an emphasis on best practices for initial and incremental data loads, and considerations for data type conversion (e.g., when loading into Amazon Redshift).

Example H2 Database Query Syntax

H2 Database uses standard SQL syntax for querying data. Below are basic examples of

SELECT
statements:

``` -- Select all columns from a table SELECT * FROM employees;

-- Select specific columns SELECT id, first_name, last_name, salary FROM employees;

-- Filtering results SELECT * FROM employees WHERE department = 'Engineering';

-- Filtering with a date range SELECT * FROM sales WHERE transaction_date >= '2024-01-01' AND transaction_date <= '2024-03-31';

-- Sorting results SELECT * FROM products ORDER BY created_at DESC;

-- Aggregate query with grouping SELECT department, COUNT(*) AS employee_count FROM employees GROUP BY department; ```

Data Type Conversion: H2 Database and Redshift

When moving data from H2 Database to Amazon Redshift, be aware that datatype conversion may be necessary. Here are some common mappings:

H2 Type Typical Redshift Type
INTEGER
INTEGER
BIGINT
BIGINT
BOOLEAN
BOOLEAN
VARCHAR(n)
VARCHAR(n)
DOUBLE
DOUBLE PRECISION
DATE
DATE
TIMESTAMP
TIMESTAMP
BLOB
BYTEA

Pay special attention to:

  • Maximum string lengths (
    VARCHAR
    )
  • Differences in date/time precision
  • Representation of booleans (inserts/updates)

Initial Load vs Incremental Load Patterns

The recommended approach for data movement is to use an initial load followed by incremental loads, via the same Database Query component. Details:

Initial Load

  • Purpose: Populate the destination table with all existing data.
  • Query pattern: No filter clause is needed.

Example:

SELECT * FROM employees;

Incremental Load

  • Purpose: Populate the destination table with only new or changed data since the previous load.
  • Query pattern: Includes a filter clause, often based on a column such as a timestamp or an increasing primary key.

Example:

SELECT * FROM employees WHERE updated_at > '2024-06-01 00:00:00';

Notes

You may use parameters in your ETL process to dynamically inject the last-loaded value.

SELECT * FROM employees WHERE updated_at > '${last_loaded_timestamp}';

By following these patterns, you can structure your data extraction logic efficiently and prepare it for later transformation and loading into Redshift or other destinations.

Data Integration Architecture

Loading data into Amazon Redshift ahead of integration is a classic example of the "divide and conquer" strategy, where the process is thoughtfully split into two distinct phases. This separation is a key advantage of the Extract, Load, Transform (ELT) architecture: first, data is extracted from source systems and loaded directly into Redshift, and only then is it transformed and integrated as needed. Since data integration inherently involves data transformation, employing robust data transformation pipelines within Redshift is considered a best practice; these pipelines ensure transformations are consistent, repeatable, and scalable. By leveraging the ELT model, all data transformation and integration tasks are executed within the target Redshift database itself. This approach delivers multiple benefits: it provides speedy, on-demand processing, automatic scalability with your cluster, and eliminates the need for separate, potentially costly, data processing infrastructure outside of Redshift.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.