Integrate data from H2 Database to Snowflake using Matillion

Our H2 Database to Snowflake connector streamlines data transfer to Snowflake within minutes, ensuring your data remains current without the need for manual coding or complex ETL scripting.

H2 Database
Snowflake
H2 Database to Snowflake banner

Extracting data from H2 Database to Snowflake

Extracting data from H2 Database is a common requirement for organizations looking to leverage cloud data warehousing platforms such as Snowflake. This process, however, involves several key steps to ensure data integrity, security, and efficiency throughout the migration journey. In this article, we will guide you through the essential steps needed to move your data from H2 Database to Snowflake. We will begin by demonstrating how to create an identity in H2 Database to properly authenticate and authorize data access. For those using Matillion, we will show you how to check for—or acquire—the necessary H2 JDBC driver to establish a reliable connection between the systems. We will also discuss the importance of configuring secure network connectivity between your H2 source and Snowflake target, ensuring data flows smoothly and safely. Finally, we will explore strategies for querying and extracting data, including both initial bulk transfers and incremental updates to keep your Snowflake environment in sync with changes in the source database. Whether you are looking to perform a one-time migration or maintain ongoing synchronization, these best practices will help streamline the process and deliver accurate results.


What is H2 Database?

H2 Database is a lightweight, fast, open-source RDBMS written in Java, ideal for embedding in Java applications. It supports standard SQL, transactions, MVCC, and offers both in-memory and persistent storage. H2 features a robust JDBC API for easy Java integration and a browser-based console for administration and queries. Compatible with PostgreSQL and MySQL, it simplifies migration and testing. Its ease of use, high performance, and minimal configuration make it a popular choice for unit testing, prototyping, and lightweight production workloads.

matillion logo x H2 Database

What is Snowflake?

Snowflake is a fully managed, cloud-native data platform designed for high-performance data warehousing, analytics, and data sharing. Its unique architecture separates compute from storage, enabling seamless scalability and concurrent user access without contention. Snowflake supports SQL and semi-structured data (like JSON, Parquet), powering workloads from ETL to advanced analytics. It features automatic scaling, robust security, and secure data sharing, with minimal maintenance—freeing teams to focus on analysis over administration. Available across major cloud providers, Snowflake offers flexibility and high availability, making it ideal for enterprises handling complex, evolving data needs.

Why Move Data from H2 Database into Snowflake

Unlocking Advanced Analytics: Transferring Data from H2 to Snowflake

A data engineer or architect may consider copying data from an H2 Database into Snowflake for several compelling reasons. Firstly, the H2 Database might contain datasets that are potentially valuable for deeper analytics or business intelligence purposes. However, the full value of this data often emerges only when it is integrated with information from other sources, enabling enriched analysis and more informed decision-making. By utilizing Snowflake as the platform for this data integration, engineers can execute complex transformations, aggregations, and queries without imposing additional workload or performance overhead on the H2 Database, which is often optimized for transactional efficiency rather than analytics. As a result, organizations can unlock the insights contained within their H2 Database while maintaining the operational stability of their existing systems.

Creating an Identity in H2 Database

In H2 Database, "users" are managed at the database level and determine access rights and authentication. You can create new users via SQL commands executed in your H2 client (such as the H2 Console or JDBC connection).

Instructions

  1. Connect as an Administrator
    To create a new user, you must be connected as a user with administrative rights (commonly

    SA
    by default).

  2. Execute the CREATE USER Statement
    Use the following SQL command, substituting

    <username>
    and
    <password>
    with your chosen credentials:

CREATE USER <username> PASSWORD '<password>';

Example:
To create a user named

reporter
with password
secure123
, execute:

CREATE USER reporter PASSWORD 'secure123';

  1. Grant Privileges (Optional)
    By default, the new user will not have any access to schema objects. To grant access, use the
    GRANT
    statement.

Example:
To grant all privileges on a schema or table to your new user:

GRANT ALL ON SCHEMA PUBLIC TO reporter;

Or, more restrictively, on a specific table:

GRANT SELECT, INSERT ON TABLE my_table TO reporter;

  1. Verify the New User
    Ensure the user can connect by disconnecting from the current session and reconnecting using the new credentials.

Notes

  • You must use single quotes around the password in the SQL statement.
  • Administrative users (e.g.,
    SA
    ) can manage all user accounts.
  • H2 does not support the
    ALTER USER
    statement to grant roles or privileges—manage access via
    GRANT
    and
    REVOKE
    .
  • The
    CREATE USER
    command will fail if the username already exists in the database.

For further details, consult the H2 SQL Syntax documentation.

Installing the JDBC Driver

When connecting Matillion Data Productivity Cloud to an H2 Database, a suitable JDBC driver is required. As of the time of writing, the H2 JDBC driver is not bundled with Matillion Data Productivity Cloud due to licensing or redistribution restrictions. This means you will need to manually download and install the appropriate driver before establishing connectivity.

Step 1: Download the H2 JDBC Driver

The H2 database provides its JDBC driver separately. You can download the latest version directly from the official H2 website:

When downloading, choose a Type 4 JDBC driver. Type 4 drivers are preferable since they are pure Java and do not require native dependencies, offering maximum flexibility and ease of installation within cloud environments.

Step 2: Install the JDBC Driver in Matillion Data Productivity Cloud

Once the driver JAR file is downloaded, you need to upload it into your Matillion Data Productivity Cloud environment. Refer to the official Matillion documentation for detailed instructions on uploading external drivers:

This guide will walk you through accessing the Agent UI, selecting the correct upload location, and ensuring the driver is recognized by Matillion for use with your data workflows.

Step 3: Configure and Use the H2 Connection

After the driver has been uploaded, you can configure and use connections to H2 databases within Matillion Data Productivity Cloud. Detailed usage instructions, including setting up connections, authentication parameters, and running queries through the platform's interface, are found here:

Make sure to follow any best practices or prerequisites noted in the official documentation when creating your H2 connections and executing queries within Matillion.

Checking Network Connectivity

To enable communication between Matillion Data Productivity Cloud and your H2 Database, you must ensure that the H2 Database accepts incoming connections based on your deployment type:

  • Full SaaS Agent Configuration
    Allow incoming connections from the specific IP addresses documented at: Matillion SaaS Allowed IP Addresses. These are the source IPs from which the Matillion Full SaaS agent will attempt to connect to your H2 Database.

  • Hybrid SaaS Deployment
    Permit incoming connections from the network range(s) that correspond to your own virtual private cloud (VPC). The Matillion Hybrid SaaS agent will connect from your infrastructure, so you should configure your firewall or network security controls accordingly. For tools to verify network access to your database, consult: Matillion Network Access Utilities.

DNS Resolution
If your H2 Database instance is referenced by a DNS name, rather than a direct IP address, ensure that the relevant Matillion agent (Full SaaS or Hybrid SaaS) can successfully resolve the hostname to the correct address. Without DNS resolution, connectivity issues may occur even if network paths are open.

Querying Data from an H2 Database

When integrating or migrating data from an H2 Database, it’s important to understand how to query data efficiently and accommodate datatype differences, especially in scenarios like moving data to Snowflake using data ingestion tools (e.g., Matillion).

Examples: H2 Database SQL SELECT Queries

Here are some sample SQL queries you would run on an H2 Database using a Database Query component.

1. Select All Records from a Table

SELECT * FROM EMPLOYEES;

2. Select Specific Columns

SELECT ID, FIRST_NAME, LAST_NAME, START_DATE FROM EMPLOYEES;

3. Filtering Data (using a WHERE clause)

SELECT * FROM ORDERS WHERE ORDER_DATE >= '2024-01-01';

4. Using Aggregates

SELECT DEPARTMENT, COUNT(*) AS NUM_EMPLOYEES
FROM EMPLOYEES
GROUP BY DEPARTMENT;

Datatype Conversion: H2 Database vs. Snowflake

Be aware that when transferring data from H2 to Snowflake, some datatype conversions occur:

  • H2's
    VARCHAR
    ,
    CHAR
    → Snowflake's
    VARCHAR
  • H2's
    INT
    ,
    BIGINT
    → Snowflake's
    NUMBER
  • H2's
    DATE
    ,
    TIME
    ,
    TIMESTAMP
    → Snowflake's corresponding date/time types (
    DATE
    ,
    TIME
    ,
    TIMESTAMP
    )
  • H2's
    BOOLEAN
    → Snowflake's
    BOOLEAN

Review both systems’ datatype documentation for edge cases, particularly for numerics and binary data.

Load Patterns: Initial and Incremental Loads

Best practice is to use a once-off initial load to onboard all historical data, then regular incremental loads to fetch only new or changed records.

Load Type Filter Clause Database Query Example
Initial Load None
SELECT * FROM EMPLOYEES;
Incremental Load Yes (with filter)
SELECT * FROM EMPLOYEES WHERE UPDATED_AT > ?;
  • Initial Load: The query SELECT * FROM TABLE brings in all data.
  • Incremental Load: Add a filter (typically using a date/time or sequence field such as
    UPDATED_AT
    ) to select only data added/modified since the last run.

For more background, see the Matillion Incremental Load article.

Tip: Always use the same Database Query component for both patterns; modify only the SQL WHERE clause and associated parameters to switch between full and incremental loads.

Data Integration Architecture

In the context of modern data pipelines, loading data into Snowflake in advance of integration—an approach central to the ELT (Extract, Load, Transform) architecture—allows organizations to divide and conquer the challenges of data processing by splitting the workflow into two distinct steps. First, all source data is loaded as-is into Snowflake, and then the work of integration and transformation is performed inside the database itself. Data integration inherently requires data transformation, and the most robust and manageable method is to use data transformation pipelines, which sequence and orchestrate the necessary operations to shape diverse source datasets into a unified schema fit for analysis or downstream use. A key advantage of the ELT model is that the transformations and integrations take place entirely within the Snowflake platform, leveraging its scalable compute power for fast, on-demand availability of integrated data. This eliminates the need for dedicated external data processing infrastructure, reducing complexity and cost, while ensuring that integration processes benefit from Snowflake’s robust, elastic resources.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.