Integrate data from VoltDB (NewSQL) to Snowflake using Matillion

Our VoltDB to Snowflake connector enables seamless and timely data transfers to Snowflake, ensuring your information stays current without requiring manual coding or the management of complex ETL scripts.

VoltDB
Snowflake
VoltDB (NewSQL) to Snowflake banner

Extracting data from VoltDB (NewSQL) to Snowflake

Extracting data from VoltDB is a critical step for organizations seeking to leverage high-performance transactional data within analytical platforms such as Snowflake. This process enables efficient movement of operational data into environments optimized for reporting, advanced analytics, and long-term storage. In this article, we will walk through the key steps required to transfer data from VoltDB to Snowflake. We begin by outlining how to create an access identity in VoltDB to ensure secure connections. For users working with Matillion, we will discuss how to check for, or acquire, the appropriate JDBC driver required for connecting to VoltDB. We will also cover essential considerations regarding network connectivity between VoltDB and Snowflake, helping you verify that data can flow smoothly between the systems. Finally, we will demonstrate best practices for querying and extracting initial data sets, as well as strategies for incremental data loads to keep Snowflake up to date with changes in VoltDB. Whether you are setting up this data pipeline for the first time or seeking to refine your current process, the following guidance will help you establish a reliable and efficient data movement solution.


What is VoltDB (NewSQL)?

VoltDB is a high-performance, in-memory, relational NewSQL database designed for applications demanding low-latency transaction processing and real-time analytics. Unlike traditional disk-based databases, VoltDB leverages a shared-nothing architecture and stores data entirely in memory, allowing it to support thousands of transactions per second with sub-millisecond latency. It ensures ACID compliance through synchronous replication and strong consistency, making it suitable for mission-critical workloads such as financial services, telecommunications, and IoT event processing. VoltDB’s native support for horizontal scaling, stored procedures in Java, and SQL access allow developers to build applications that process streaming data in real time while maintaining transactional integrity and durability through continuous snapshots and command logging.

matillion logo x VoltDB

What is Snowflake?

Snowflake is a fully managed, cloud-native data platform designed for scalable analytics, including data warehousing, lakes, engineering, and sharing. Built on a multi-cluster, shared data architecture, it separates compute, storage, and services, allowing elastic resource scaling without infrastructure management. Snowflake supports semi-structured data formats (JSON, Avro, Parquet), offers a native SQL interface, and integrates seamlessly with BI and ETL tools. Key features include automatic encryption, zero-copy cloning, time travel, and secure data sharing, enabling collaborative and rapid analytics development across teams and locations, making it a versatile choice for modern analytics needs.

Why Move Data from VoltDB (NewSQL) into Snowflake

Unlocking Analytical Insights: Copying Data from VoltDB to Snowflake

There are several reasons why a data engineer or architect might choose to copy data from VoltDB into Snowflake. Firstly, VoltDB typically contains transactional or real-time data that can be highly valuable when leveraged for analytics, reporting, or business intelligence. By integrating this data with information from other sources—such as CRM systems, data lakes, or historical archives—organizations can uncover deeper insights, enhance their decision-making, and realize the full potential of their data assets. Furthermore, performing data integration and analysis directly within Snowflake avoids placing additional processing demands on VoltDB, which is often optimized for high-throughput transaction processing rather than complex analytical workloads. Leveraging Snowflake's scalable and flexible platform thus enables efficient analysis without compromising the performance and reliability of mission-critical systems that rely on VoltDB.

Creating a User in VoltDB

VoltDB provides an authentication mechanism based on user credentials. Users are defined in a deployment configuration file (typically

deployment.xml
) rather than by issuing SQL statements during normal operation. This approach ensures centralized management and greater security.

Step 1: Update the Deployment Configuration

To create a new user, you need to edit the

deployment.xml
file for your VoltDB database. Add or modify the
<users>
section. For example, the following snippet creates a user named
dbuser
with password authentication:

xml
<deployment>
  ...
  <users>
    <user name="dbuser">
      <password>securePassword123</password>
    </user>
  </users>
  ...
</deployment>

Notes: - The

name
attribute specifies the username. - Supply a strong value for
<password>
. - You can define multiple
<user>
blocks for additional users.

Step 2: Assign User Permissions (Optional)

You can restrict permissions using the

<roles>
option. The roles can specify allowed procedures and other access controls (consult the VoltDB documentation for advanced permission schemes). For full access, define the user as shown above. For more granular control:

xml
<user name="readonly">
  <password>readonlyPass</password>
  <roles>readonly</roles>
</user>

Step 3: Deploy the Configuration

After updating the deployment file:

  1. Save your changes to
    deployment.xml
    .
  2. If the VoltDB cluster is running, shutdown the cluster gracefully.
  3. Start (or restart) the VoltDB server using the updated deployment file:

voltadmin shutdown
   voltdb start --deployment=deployment.xml

or, for a new cluster:

voltdb init --deployment=deployment.xml
   voltdb start

Step 4: Connect Using the New User

When connecting to VoltDB (e.g., using

sqlcmd
or a client driver), supply the username and password:

sqlcmd --user=dbuser --password=securePassword123

Or configure the authentication options in your application’s VoltDB client connection logic.


For further detail, always review the latest VoltDB documentation on security configuration and authentication best practices.

Installing the JDBC driver

At the time of writing, the JDBC driver for VoltDB is not included by default in Matillion Data Productivity Cloud due to licensing or redistribution restrictions. To use VoltDB as a data source, you must manually download and install the driver as an external component.

Step 1: Download the JDBC Driver

  1. Navigate to the VoltDB download page: https://www.voltdb.com/download/
  2. On the downloads page, look specifically for the Type 4 JDBC driver, as this provides pure Java connectivity and is most suitable for integration in environments like Matillion Data Productivity Cloud.
  3. Download the appropriate VoltDB JDBC driver archive for your version and platform.

Step 2: Install the Driver in Matillion Data Productivity Cloud

Since the driver is not bundled with Matillion Data Productivity Cloud, you must upload it manually:

  1. Follow the installation guidance at: https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/
  2. Prepare the VoltDB JDBC driver
    .jar
    file you downloaded in the previous step.
  3. Use your Matillion Data Productivity Cloud Agent interface to upload the JDBC driver in accordance with the documented process, ensuring it is available for use within Matillion components.

Step 3: Configure and Use the Driver

After successful installation, you may configure connection settings and utilize the driver within your Matillion Data Productivity Cloud projects:

By completing these steps, you'll have enabled Matillion Data Productivity Cloud to connect and interact with VoltDB using the JDBC driver.

Checking network connectivity

Before connecting Matillion Data Productivity Cloud to your VoltDB database, you must ensure that VoltDB is configured to accept incoming connections based on your deployment type:

Additionally, if your VoltDB instance is referenced using a DNS name, you must ensure that the Full SaaS or Hybrid SaaS agent can resolve the DNS address correctly. This may require updating your DNS settings or allowing appropriate DNS traffic for name resolution.

Querying Data from VoltDB

This guide provides technical instructions for querying data from a VoltDB database, including example SQL

SELECT
statements, explanations of datatype conversion between VoltDB and Snowflake, and recommended ETL strategies using the Database Query component.


Example VoltDB SQL SELECT Statements

VoltDB supports standard SQL for querying data. Here are examples for common query patterns:

``` -- Select all columns from a table SELECT * FROM customer;

-- Select specific columns SELECT customer_id, name, balance FROM customer;

-- Use conditions (WHERE clause) SELECT order_id, product_id, quantity FROM orders WHERE order_date >= '2024-01-01';

-- Use aggregation and grouping SELECT product_id, SUM(amount) as total_sales FROM sales WHERE sale_date BETWEEN '2024-01-01' AND '2024-03-31' GROUP BY product_id HAVING total_sales > 1000; ```

Datatype Conversion: VoltDB ↔️ Snowflake

When transferring data from VoltDB to Snowflake, be aware that datatype conversion may occur. For example:

VoltDB Type Approximate Snowflake Type
INTEGER NUMBER
BIGINT NUMBER
FLOAT FLOAT
DECIMAL(p,s) NUMBER(p,s)
VARCHAR(n) VARCHAR(n)
TIMESTAMP TIMESTAMP_NTZ
BOOLEAN BOOLEAN

Ensure you check for compatibility, especially for precise types like

DECIMAL
or for date/time fields.

Best Pattern: Initial Load and Incremental Loads

A recommended approach to querying (and replicating) VoltDB data is to perform:

  1. Initial Load: Extract the full dataset from VoltDB.
  2. Incremental Loads: Subsequently extract only new or updated records.

This pattern facilitates efficient ETL and synchronization with downstream systems such as Snowflake.

Using the Database Query Component

Both loading phases can use the same Database Query component, typically within tools like Matillion.

  • Initial Load:
  • Query: No filter clause—extracts all data.
  • Example:

    SELECT * FROM orders;

  • Incremental Load:

  • Query: Includes a filter to extract data modified or added since the last run.
  • Example (using a high-water mark column such as
    updated_at
    ):
    SELECT * FROM orders WHERE updated_at > '2024-06-01T00:00:00';

Adjust the filter value programmatically based on the last load's maximum timestamp or sequence id.

For further details on incremental load strategies, see the Incremental Load / Data Replication Strategy.


This pattern ensures a maintainable, efficient, and robust data integration process when extracting from VoltDB and loading into data warehouses like Snowflake.

Data Integration Architecture

One advantage of the ELT (Extract, Load, Transform) architecture is its divide-and-conquer approach: by loading data into Snowflake ahead of integration, the process is split into two manageable steps. This allows teams to separate the challenges of data ingestion from those of data transformation. Effective data integration relies on robust transformation, and leveraging data transformation pipelines is the recommended approach, as these pipelines automate and standardize the necessary modifications to raw data. Furthermore, ELT offers the significant benefit of performing all data transformations and integrations directly within the Snowflake database itself. This in-database processing is inherently fast, scales with demand, and eliminates the need for separate, costly data processing infrastructure—making data integration both efficient and economical.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.