Integrate data from MonetDB to Amazon Redshift using Matillion

Our MonetDB to Redshift connector seamlessly transfers your data to Redshift within minutes, ensuring it remains current without requiring manual coding or the management of complex ETL scripts.

MonetDB
Amazon Redshift
MonetDB to Amazon Redshift banner

Extracting data from MonetDB to Amazon Redshift

Extracting data from MonetDB is a crucial step in enabling seamless data integration and analysis, especially when loading data into powerful cloud data warehouses like Amazon Redshift. Ensuring a smooth migration process requires careful preparation and execution at each stage. In this article, we will guide you through all key aspects of this process: establishing an identity in MonetDB to ensure secure and organized data access, verifying or obtaining the necessary JDBC driver within Matillion for smooth connectivity, and configuring network connectivity between your MonetDB source and Redshift target environments. We will also discuss best practices for extracting your data—covering both initial full loads and efficient incremental refreshes—to maximize data integrity and minimize downtime. By following these steps, you can optimize your end-to-end ETL workflow from MonetDB to Redshift.


What is MonetDB?

MonetDB is an open-source, high-performance analytical column-store database designed for complex queries on large datasets. Its column-oriented architecture, vectorized execution engine, and automatic indexing accelerate OLAP workloads involving aggregates and joins. MonetDB supports advanced SQL features, SQL/MED, and geospatial data, setting it apart from traditional row-oriented databases. Optimized for modern hardware, it efficiently uses memory and CPU caches, enabling multi-core parallel processing and concurrent analytics. MonetDB is ideal for data warehousing, business intelligence, and scientific data management that require fast, interactive exploration of data at scale.

matillion logo x MonetDB

What is Amazon Redshift?

Amazon Redshift is a fully managed, petabyte-scale AWS data warehouse designed for large analytics workloads. Using columnar storage and massively parallel processing (MPP), it optimizes query speed and storage for complex queries on big datasets. Redshift integrates with AWS services, enabling easy data loading from S3, DynamoDB, and more, and features automated backups, encryption, and scaling. With Redshift Spectrum, users can directly query data across S3 data lakes for federated analytics, removing the need for data movement. Supporting standard SQL and BI tool integration, Redshift is widely used for enterprise data warehousing, modeling, and business analytics.

Why Move Data from MonetDB into Amazon Redshift

The Case for Transferring Data from MonetDB to Redshift for Advanced Analytics

A data engineer or architect might choose to copy data from MonetDB into Redshift for several compelling reasons. Primarily, MonetDB may host data that holds significant potential value for the organization's analytics and decision-making processes. By integrating this data with information from other sources within Redshift, the organization can uncover deeper insights and generate more comprehensive reports. Crucially, performing data integration and analysis within Redshift allows the engineer to leverage Redshift’s scalable architecture and advanced analytical capabilities, without imposing additional workload or complexity on the MonetDB system itself. This approach minimizes the risk of performance degradation on MonetDB, ensuring it continues to operate efficiently while maximizing the utility of its data through well-orchestrated integration and analysis on Redshift.

Creating a User in MonetDB

To add a new user identity in a MonetDB database, you will execute SQL statements either from a MonetDB client such as mclient or from an administrative interface. This process requires you to have administrative privileges (i.e., you must be a database administrator, often aliased as user monetdb).

Prerequisites

  • Access to a running MonetDB instance.
  • Appropriate rights to connect and execute statements as an administrator.

Step 1: Log In as an Administrative User

Connect to your database as the

monetdb
user or another user with administrative privileges. For example, from the command line:

mclient -d <database_name> -u monetdb -P <admin_password>

Replace

<database_name>
and
<admin_password>
with your actual database name and admin password.

Step 2: Create the User

Use the

CREATE USER
SQL statement. Syntax is as follows:

CREATE USER "<username>" WITH PASSWORD '<password>';

  • <username>
    : The user name. User names are case-sensitive if double-quoted.
  • <password>
    : The user’s password.

Example:

CREATE USER "reporting_user" WITH PASSWORD 'Secur3P@ssw0rd';

Step 3: Grant Privileges

By default, new users have no access privileges. Grant rights according to your needs, for example:

GRANT SELECT ON <schema>.<table> TO "reporting_user";
-- Or, to grant all privileges on all tables in the schema:
GRANT ALL ON SCHEMA <schema> TO "reporting_user";

Replace

<schema>
and
<table>
as appropriate.

Step 4: (Optional) List All Users

To see a list of defined users:

SELECT * FROM users;

This allows you to verify that the new user was added successfully.


Note: Changing a password can be accomplished via:

ALTER USER "<username>" SET PASSWORD '<new_password>';

And to remove a user:

DROP USER "<username>";

Refer to the MonetDB SQL documentation for further details on advanced privilege management.

Installing the JDBC driver

At the time of writing, the MonetDB JDBC driver is not bundled by default with Matillion Data Productivity Cloud. This is due to licensing or redistribution restrictions associated with the MonetDB JDBC driver. Consequently, to connect Matillion to a MonetDB database, you must manually download and install the driver.

Follow these instructions to add the MonetDB JDBC driver to your Matillion Data Productivity Cloud environment:

1. Download the MonetDB JDBC Driver

  • Visit the MonetDB JDBC driver download page: https://www.monetdb.org/Documentation/JDBC.
  • Locate and download the appropriate version. Look specifically for the Type 4 JDBC driver, as it is purely Java-based and does not require native libraries. Type 4 drivers are generally the recommended option for Matillion integration.

2. Upload the Driver to Matillion Data Productivity Cloud

  • To add the downloaded MonetDB JDBC driver JAR file to your Matillion environment, refer to the instructions provided by Matillion: Uploading External Drivers.
  • In summary, you will generally need to access the admin console for your Matillion Agent, locate the external drivers management section, and upload the MonetDB JDBC driver JAR file you downloaded earlier.

3. Configure and Use the Driver

  • After installing the driver, configuration is required to start using it within Matillion. For procedures on how to set up and connect to databases using your installed JDBC driver, refer to: Database Query Usage Instructions.
  • This documentation will detail how to create database connections, configure authentication, and manage queries via the installed MonetDB JDBC driver.

Following these instructions will enable connectivity between Matillion Data Productivity Cloud and your MonetDB database using the appropriate Type 4 JDBC driver.

Checking network connectivity

When configuring connectivity between Matillion Data Productivity Cloud and your MonetDB database, you must ensure that the MonetDB instance is accessible for incoming connections based on your deployment model:

Additionally, if you are referencing the MonetDB database using a DNS hostname, ensure that the Full SaaS or Hybrid SaaS agent can resolve this DNS address to the correct IP address. This may require updating your DNS settings or ensuring that your network environment provides the necessary DNS resolution capabilities.

Querying Data from MonetDB

MonetDB is a high-performance column-oriented database optimized for analytical queries. To query data from MonetDB, you use standard SQL syntax via SQL clients, integration tools, or ETL platforms.

Example MonetDB SQL SELECT Statements

Below are some basic examples:

``` -- Select all columns and rows from a table SELECT * FROM sales_orders;

-- Select specific columns with a filter SELECT order_id, customer_id, order_date FROM sales_orders WHERE order_date >= '2024-01-01';

-- Aggregate and group data SELECT customer_id, SUM(order_total) AS total_spent FROM sales_orders GROUP BY customer_id;

-- Join multiple tables SELECT so.order_id, c.customer_name FROM sales_orders so JOIN customers c ON so.customer_id = c.id WHERE so.status = 'SHIPPED'; ```

Datatype Conversion: MonetDB vs. Redshift

When integrating MonetDB with Amazon Redshift (e.g., via ETL), be mindful of datatype differences. Some conversions may be necessary; for example:

MonetDB Datatype Redshift Equivalent
VARCHAR VARCHAR
INTEGER INTEGER
BIGINT BIGINT
DOUBLE DOUBLE PRECISION
TIMESTAMP TIMESTAMP
BOOLEAN BOOLEAN

Review your ETL / migration tooling documentation for specifics about how datatypes are mapped or converted.

Initial and Incremental Loads Pattern

A robust pattern when extracting data from MonetDB involves:

  • Initial Load: Extract all data once (e.g., for a brand new replication or migration).
  • Incremental Loads: Periodically extract only new or updated data based on a filter (e.g., a timestamp or primary key range).

Both operations can use the same Database Query component in your ETL workflow.

Initial Load Query

During the initial load, run a simple query without filters:

SELECT *
FROM sales_orders;

Incremental Load Query

For incremental loads, add a filter clause. For example, if tracking by last modified date:

SELECT *
FROM sales_orders
WHERE last_modified > '2024-06-01 00:00:00';

Or using a high-water-marking technique, such as loading records with an

order_id
greater than a previously stored value:

SELECT *
FROM sales_orders
WHERE order_id > 100000;

For an in-depth guide on this ETL pattern, see Matillion's article on incremental load data replication strategy.

Data Integration Architecture

One of the key advantages of the ELT (Extract, Load, Transform) architecture is its ability to break down data integration into two distinct steps—loading data in advance before starting integration. This "divide and conquer" approach helps organizations manage complexity, as they can quickly ingest raw data and then systematically focus on necessary transformation and integration tasks. Data integration inherently involves transforming disparate datasets to yield meaningful, analyzable outputs, and data transformation pipelines provide the optimal means to accomplish these tasks efficiently. Another notable benefit of ELT is that both data transformation and integration are performed within the target Redshift database itself. This makes the process fast, scalable, and available on demand, while also eliminating the need—and associated cost—of maintaining separate data processing infrastructure outside of Redshift.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.