Extracting data from H2 Database to Snowflake
Extracting data from H2 Database is a common requirement for organizations looking to leverage cloud data warehousing platforms such as Snowflake. This process, however, involves several key steps to ensure data integrity, security, and efficiency throughout the migration journey. In this article, we will guide you through the essential steps needed to move your data from H2 Database to Snowflake. We will begin by demonstrating how to create an identity in H2 Database to properly authenticate and authorize data access. For those using Matillion, we will show you how to check for—or acquire—the necessary H2 JDBC driver to establish a reliable connection between the systems. We will also discuss the importance of configuring secure network connectivity between your H2 source and Snowflake target, ensuring data flows smoothly and safely. Finally, we will explore strategies for querying and extracting data, including both initial bulk transfers and incremental updates to keep your Snowflake environment in sync with changes in the source database. Whether you are looking to perform a one-time migration or maintain ongoing synchronization, these best practices will help streamline the process and deliver accurate results.
What is H2 Database?
H2 Database is a lightweight, fast, open-source RDBMS written in Java, ideal for embedding in Java applications. It supports standard SQL, transactions, MVCC, and offers both in-memory and persistent storage. H2 features a robust JDBC API for easy Java integration and a browser-based console for administration and queries. Compatible with PostgreSQL and MySQL, it simplifies migration and testing. Its ease of use, high performance, and minimal configuration make it a popular choice for unit testing, prototyping, and lightweight production workloads.
What is Snowflake?
Snowflake is a fully managed, cloud-native data platform designed for high-performance data warehousing, analytics, and data sharing. Its unique architecture separates compute from storage, enabling seamless scalability and concurrent user access without contention. Snowflake supports SQL and semi-structured data (like JSON, Parquet), powering workloads from ETL to advanced analytics. It features automatic scaling, robust security, and secure data sharing, with minimal maintenance—freeing teams to focus on analysis over administration. Available across major cloud providers, Snowflake offers flexibility and high availability, making it ideal for enterprises handling complex, evolving data needs.
Why Move Data from H2 Database into Snowflake
Unlocking Advanced Analytics: Transferring Data from H2 to Snowflake
A data engineer or architect may consider copying data from an H2 Database into Snowflake for several compelling reasons. Firstly, the H2 Database might contain datasets that are potentially valuable for deeper analytics or business intelligence purposes. However, the full value of this data often emerges only when it is integrated with information from other sources, enabling enriched analysis and more informed decision-making. By utilizing Snowflake as the platform for this data integration, engineers can execute complex transformations, aggregations, and queries without imposing additional workload or performance overhead on the H2 Database, which is often optimized for transactional efficiency rather than analytics. As a result, organizations can unlock the insights contained within their H2 Database while maintaining the operational stability of their existing systems.
Similar connectors
Creating an Identity in H2 Database
In H2 Database, "users" are managed at the database level and determine access rights and authentication. You can create new users via SQL commands executed in your H2 client (such as the H2 Console or JDBC connection).
Instructions
-
Connect as an Administrator
To create a new user, you must be connected as a user with administrative rights (commonlySA
by default). -
Execute the CREATE USER Statement
Use the following SQL command, substituting<username>
and<password>
with your chosen credentials:
CREATE USER <username> PASSWORD '<password>';
Example:
To create a user named
reporterwith password
secure123, execute:
CREATE USER reporter PASSWORD 'secure123';
- Grant Privileges (Optional)
By default, the new user will not have any access to schema objects. To grant access, use theGRANT
statement.
Example:
To grant all privileges on a schema or table to your new user:
GRANT ALL ON SCHEMA PUBLIC TO reporter;
Or, more restrictively, on a specific table:
GRANT SELECT, INSERT ON TABLE my_table TO reporter;
- Verify the New User
Ensure the user can connect by disconnecting from the current session and reconnecting using the new credentials.
Notes
- You must use single quotes around the password in the SQL statement.
- Administrative users (e.g.,
SA
) can manage all user accounts. - H2 does not support the
ALTER USER
statement to grant roles or privileges—manage access viaGRANT
andREVOKE
. - The
CREATE USER
command will fail if the username already exists in the database.
For further details, consult the H2 SQL Syntax documentation.
Installing the JDBC Driver
When connecting Matillion Data Productivity Cloud to an H2 Database, a suitable JDBC driver is required. As of the time of writing, the H2 JDBC driver is not bundled with Matillion Data Productivity Cloud due to licensing or redistribution restrictions. This means you will need to manually download and install the appropriate driver before establishing connectivity.
Step 1: Download the H2 JDBC Driver
The H2 database provides its JDBC driver separately. You can download the latest version directly from the official H2 website:
When downloading, choose a Type 4 JDBC driver. Type 4 drivers are preferable since they are pure Java and do not require native dependencies, offering maximum flexibility and ease of installation within cloud environments.
Step 2: Install the JDBC Driver in Matillion Data Productivity Cloud
Once the driver JAR file is downloaded, you need to upload it into your Matillion Data Productivity Cloud environment. Refer to the official Matillion documentation for detailed instructions on uploading external drivers:
This guide will walk you through accessing the Agent UI, selecting the correct upload location, and ensuring the driver is recognized by Matillion for use with your data workflows.
Step 3: Configure and Use the H2 Connection
After the driver has been uploaded, you can configure and use connections to H2 databases within Matillion Data Productivity Cloud. Detailed usage instructions, including setting up connections, authentication parameters, and running queries through the platform's interface, are found here:
Make sure to follow any best practices or prerequisites noted in the official documentation when creating your H2 connections and executing queries within Matillion.
Checking Network Connectivity
To enable communication between Matillion Data Productivity Cloud and your H2 Database, you must ensure that the H2 Database accepts incoming connections based on your deployment type:
-
Full SaaS Agent Configuration
Allow incoming connections from the specific IP addresses documented at: Matillion SaaS Allowed IP Addresses. These are the source IPs from which the Matillion Full SaaS agent will attempt to connect to your H2 Database. -
Hybrid SaaS Deployment
Permit incoming connections from the network range(s) that correspond to your own virtual private cloud (VPC). The Matillion Hybrid SaaS agent will connect from your infrastructure, so you should configure your firewall or network security controls accordingly. For tools to verify network access to your database, consult: Matillion Network Access Utilities.
DNS Resolution
If your H2 Database instance is referenced by a DNS name, rather than a direct IP address, ensure that the relevant Matillion agent (Full SaaS or Hybrid SaaS) can successfully resolve the hostname to the correct address. Without DNS resolution, connectivity issues may occur even if network paths are open.
Querying Data from an H2 Database
When integrating or migrating data from an H2 Database, it’s important to understand how to query data efficiently and accommodate datatype differences, especially in scenarios like moving data to Snowflake using data ingestion tools (e.g., Matillion).
Examples: H2 Database SQL SELECT Queries
Here are some sample SQL queries you would run on an H2 Database using a Database Query component.
1. Select All Records from a Table
SELECT * FROM EMPLOYEES;
2. Select Specific Columns
SELECT ID, FIRST_NAME, LAST_NAME, START_DATE FROM EMPLOYEES;
3. Filtering Data (using a WHERE clause)
SELECT * FROM ORDERS WHERE ORDER_DATE >= '2024-01-01';
4. Using Aggregates
SELECT DEPARTMENT, COUNT(*) AS NUM_EMPLOYEES FROM EMPLOYEES GROUP BY DEPARTMENT;
Datatype Conversion: H2 Database vs. Snowflake
Be aware that when transferring data from H2 to Snowflake, some datatype conversions occur:
- H2's
VARCHAR
,CHAR
→ Snowflake'sVARCHAR
- H2's
INT
,BIGINT
→ Snowflake'sNUMBER
- H2's
DATE
,TIME
,TIMESTAMP
→ Snowflake's corresponding date/time types (DATE
,TIME
,TIMESTAMP
) - H2's
BOOLEAN
→ Snowflake'sBOOLEAN
Review both systems’ datatype documentation for edge cases, particularly for numerics and binary data.
Load Patterns: Initial and Incremental Loads
Best practice is to use a once-off initial load to onboard all historical data, then regular incremental loads to fetch only new or changed records.
| Load Type | Filter Clause | Database Query Example |
|---|---|---|
| Initial Load | None | SELECT * FROM EMPLOYEES; |
| Incremental Load | Yes (with filter) | SELECT * FROM EMPLOYEES WHERE UPDATED_AT > ?; |
- Initial Load: The query SELECT * FROM TABLE brings in all data.
- Incremental Load: Add a filter (typically using a date/time or sequence field such as
UPDATED_AT
) to select only data added/modified since the last run.
For more background, see the Matillion Incremental Load article.
Tip: Always use the same Database Query component for both patterns; modify only the SQL WHERE clause and associated parameters to switch between full and incremental loads.
Data Integration Architecture
In the context of modern data pipelines, loading data into Snowflake in advance of integration—an approach central to the ELT (Extract, Load, Transform) architecture—allows organizations to divide and conquer the challenges of data processing by splitting the workflow into two distinct steps. First, all source data is loaded as-is into Snowflake, and then the work of integration and transformation is performed inside the database itself. Data integration inherently requires data transformation, and the most robust and manageable method is to use data transformation pipelines, which sequence and orchestrate the necessary operations to shape diverse source datasets into a unified schema fit for analysis or downstream use. A key advantage of the ELT model is that the transformations and integrations take place entirely within the Snowflake platform, leveraging its scalable compute power for fast, on-demand availability of integrated data. This eliminates the need for dedicated external data processing infrastructure, reducing complexity and cost, while ensuring that integration processes benefit from Snowflake’s robust, elastic resources.