Integrate data from MonetDB to Snowflake using Matillion

Our MonetDB to Snowflake connector streamlines data transfer within minutes, ensuring your information remains current without requiring manual coding or complex ETL processes.

MonetDB
Snowflake
MonetDB to Snowflake banner

Extracting data from MonetDB to Snowflake

Extracting data from MonetDB is an essential task for organizations seeking to leverage modern cloud-based platforms such as Snowflake for advanced analytics and scalable storage. This article provides a practical guide for efficiently moving your data from MonetDB to Snowflake, ensuring data integrity and optimal performance throughout the process. We will begin by outlining how to create an identity in MonetDB, which is a fundamental step for managing user access and securely performing data extractions. For users of Matillion, the popular ETL (Extract, Transform, Load) tool, we will cover how to check for and acquire the correct JDBC driver required to establish a reliable connection with MonetDB. Network connectivity is another crucial consideration, and we will discuss the necessary steps to ensure secure and seamless data transfer from the MonetDB source to the Snowflake target environment. Finally, we will explore best practices for querying data—both for your initial full data loads and for implementing efficient incremental loads as your datasets evolve. Whether you are performing a one-time migration or setting up a recurring data pipeline, this article will equip you with the knowledge needed to extract and load your MonetDB data into Snowflake with confidence.


What is MonetDB?

MonetDB is an open-source column-store database optimized for high-performance analytics and complex queries, especially in data warehousing and business intelligence. Developed at CWI in the Netherlands, it uses vertical fragmentation and vectorized execution for faster read-heavy operations. Unlike traditional row-based databases, MonetDB features automatic indexing, late materialization, and adaptive query optimization, boosting throughput for large, ad hoc SQL queries. Its extensible architecture supports integration with multiple programming languages and analytical tools, making it popular among data scientists and engineers managing large, evolving datasets.

matillion logo x MonetDB

What is Snowflake?

Snowflake is a cloud-native data warehousing platform that handles large-scale data storage, processing, and analytics with scalable multi-cluster architecture. It separates compute and storage, enabling independent, on-demand scaling and optimal performance. Supporting both structured and semi-structured data (like JSON, Avro, Parquet) and native SQL, it automates infrastructure and optimization tasks, reducing administrative burden. Snowflake features robust security, integrates broadly with data pipelines and BI tools, and offers capabilities like time travel and secure data sharing. This makes it a powerful solution for organizations seeking to centralize, modernize, and simplify data management in the cloud.

Why Move Data from MonetDB into Snowflake

The Case for Replicating MonetDB Data in Snowflake for Enhanced Analytics

A data engineer or architect may choose to copy data from MonetDB into Snowflake for several compelling reasons. MonetDB often contains potentially valuable data, derived from transaction systems, analytics, or historical records, which can provide significant insights when utilized effectively. However, the true value of this data is typically realized when it is integrated with information from other sources to form a unified analytics platform. By moving copies of MonetDB data into Snowflake, practitioners can take advantage of Snowflake's scalable cloud architecture to perform complex data integration, transformation, and analysis. This approach ensures that heavy query workloads and computational processes run on Snowflake, thus avoiding any adverse impact on MonetDB’s primary transactional or operational performance, and helps maintain the stability and responsiveness of the source system.

Creating a User in MonetDB

To add a new user (identity) in a MonetDB database, you will typically use SQL commands executed from the

mclient
command-line tool or any other SQL interface connected to your MonetDB server.

1. Connect to MonetDB

Before you create a new user, connect to your MonetDB database with a user that has sufficient privileges (typically the

admin
user):

mclient -u <admin_user> -d <database_name>
Replace
<admin_user>
and
<database_name>
with your actual administrator username and the database name.

2. Creating a New User

The following SQL command creates a user:

CREATE USER "<username>" WITH PASSWORD '<password>';

  • <username>
    : The name for the new user. (MonetDB usernames are case-sensitive and should usually be lowercase or quoted if case or special characters are required.)
  • <password>
    : The password the user will use to authenticate.

Example:

CREATE USER "alice" WITH PASSWORD 'MySecureP@ssw0rd';

3. Grant Privileges to the User

Newly created users do not have permissions on any schema or tables by default. Grant specific privileges according to your requirements. For example, to grant connect permission and access to all tables in the

public
schema:

GRANT CONNECT TO "alice";
GRANT SELECT, INSERT, UPDATE, DELETE ON SCHEMA "public" TO "alice";

4. (Optional) Assign a User to a Group

You may want to assign the user to a predefined group for role-based access:

GRANT "<groupname>" TO "alice";
Example:

GRANT "readers" TO "alice";

5. Verify the New User

You can verify user creation by querying the system catalog:

SELECT * FROM sys.users;

This will return the list of all users currently configured in your MonetDB database.


Please refer to the MonetDB official documentation for advanced configuration and best practices concerning database security and user management.

Installing the JDBC Driver

At the time of writing, the MonetDB JDBC driver is not included in Matillion Data Productivity Cloud by default, due to licensing or redistribution restrictions. Therefore, you will need to obtain and install the driver manually before you can connect Matillion to a MonetDB database.

1. Downloading the MonetDB JDBC Driver

Begin by visiting the official MonetDB JDBC documentation and download page:

Ensure you select the appropriate driver version for your MonetDB release and Java environment. For optimal compatibility and performance, prioritize downloading the Type 4 JDBC driver, which is a pure Java implementation and does not require any native libraries.

2. Installing the JDBC Driver in Matillion Data Productivity Cloud

Once you have downloaded the MonetDB JDBC driver (typically as a

.jar
file), it must be uploaded to your Matillion Data Productivity Cloud environment. The process for uploading and installing external JDBC drivers is detailed in the Matillion documentation at this location:

Follow these steps, which generally involve:

  • Accessing the correct Agent environment within your Matillion instance.
  • Uploading the JDBC driver
    .jar
    via the interface or specified method.
  • Restarting the agent or service if prompted, to ensure the driver is recognized.

3. Using the MonetDB JDBC Driver in a Database Query Component

After successful installation, you can configure and use the MonetDB JDBC driver with database query components in Matillion. For specific guidance on connecting to your MonetDB database and running queries, refer to the following documentation:

This resource covers details such as specifying JDBC connection URLs, authentication, advanced settings, and component usage scenarios within your data workflows.

Checking network connectivity

To establish a connection between Matillion Data Productivity Cloud and your MonetDB database, you must ensure that your MonetDB database instance allows incoming network connections, based on your Matillion deployment type:

  • Full SaaS Agent: Configure your MonetDB instance’s firewall or network security settings to accept incoming connections from the set of Matillion IP addresses listed at Matillion Allowing IP Addresses.

  • Hybrid SaaS Deployment: Allow incoming connections from your own virtual private cloud (VPC), where your Matillion Hybrid SaaS Agent is deployed. To test and verify connectivity, you can use utilities available from Matillion Exchange: Check Network Access.

Additionally, if you are referencing your MonetDB database by its DNS name (and not by direct IP address), make sure that the Full SaaS or Hybrid SaaS Agent can resolve this DNS address to the correct IP. This ensures reliable connectivity and prevents potential name resolution errors during connection attempts.

Querying Data from MonetDB

MonetDB is a column-oriented, open-source database management system optimized for high-performance querying of large datasets. This guide provides instructions for technical users on querying data from MonetDB, especially in contexts involving data conversion to systems like Snowflake and recommended loading patterns.

Example SQL SELECT Queries in MonetDB

Typical queries to retrieve data from MonetDB use standard SQL

SELECT
statements. Examples:

``` -- Selecting all columns from a table named customers SELECT * FROM customers;

-- Selecting specific columns SELECT customer_id, first_name, last_name FROM customers;

-- Filtering with a WHERE clause SELECT * FROM orders WHERE order_date >= '2024-01-01';

-- Aggregation example SELECT status, COUNT(*) AS order_count FROM orders GROUP BY status;

-- Joining tables SELECT o.order_id, c.first_name, c.last_name, o.total_amount FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE o.order_date >= '2024-01-01'; ```

Datatype Conversion: MonetDB to Snowflake

When transferring data between MonetDB and Snowflake, watch for datatype compatibility issues:

  • MonetDB
    INTEGER
    → Snowflake
    NUMBER
  • MonetDB
    VARCHAR
    → Snowflake
    TEXT
  • MonetDB
    TIMESTAMP
    → Snowflake
    TIMESTAMP_NTZ
  • MonetDB
    FLOAT
    or
    DOUBLE
    → Snowflake
    FLOAT
  • MonetDB
    BLOB
    /
    CLOB
    → Snowflake
    BINARY
    /
    TEXT

Ensure that your data loading tools or query components correctly map and convert types to avoid data precision and compatibility issues.

Best Practice: Initial and Incremental Loads

The most effective pattern when replicating data from MonetDB to a data warehouse (like Snowflake) involves:

  1. Once-Off Initial Load: Load the full dataset.
  2. Incremental Loads: Repeatedly load only new or changed records.

Using the Same Database Query Component

Both load types should use the same Database Query component (such as in Matillion ETL or another ELT tool). The difference is in the filter used in the query.

Initial Load Example

For the first (initial) load, the SQL SELECT query typically has no filter clause, pulling the entire dataset:

SELECT * FROM orders;

Incremental Load Example

For subsequent (incremental) loads, the Database Query component uses a filter clause (often a

WHERE
clause utilizing a modification timestamp or an incrementing key) to load only new or updated records:

SELECT * FROM orders
WHERE order_date >= '2024-06-01';

Or, using an incrementing primary key:

SELECT * FROM orders
WHERE order_id > 123456;

For more details, see Incremental Load Data Replication Strategy.

Note: Proper tracking of watermark columns (such as last loaded

order_date
or
order_id
) is essential in your workflow for robust incremental data loading.


By following these patterns, you can efficiently and reliably query and move data from MonetDB to downstream systems while handling type conversion and minimizing data movement.

Data Integration Architecture

Loading data into Snowflake in advance of integration is an effective "divide and conquer" strategy that splits the process into two distinct steps: first, extracting and loading raw data, and second, performing transformations and integrations. This approach is a core advantage of the Extract, Load, and Transform (ELT) architecture. Data integration inherently requires significant data transformation, and the most effective method for handling this is through the use of dedicated data transformation pipelines. A further benefit of the ELT approach is that these transformations and integrations take place entirely within the target Snowflake database itself. This ensures processing is both fast and on-demand, leveraging Snowflake’s high-performance, scalable compute resources. Moreover, it eliminates the need and associated costs for separate data processing infrastructure, as all workloads are executed within the existing Snowflake environment.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.