Extracting data from eXtremeDB to Amazon Redshift
Extracting data from eXtremeDB is a common requirement for organizations seeking to integrate operational data with analytical platforms such as Amazon Redshift. Migrating data efficiently and reliably involves multiple steps, from establishing secure connections to managing data consistency. In this article, we will guide you through the process of extracting data from eXtremeDB and loading it into Redshift, with a particular focus on users employing Matillion for orchestration. We will begin by outlining the creation of an appropriate identity (such as a user or service account) within eXtremeDB to facilitate secure, auditable access. For Matillion users, we will explain how to check for the necessary JDBC driver and, if needed, how to acquire and configure it. Next, we will address the importance of confirming network connectivity between your eXtremeDB instance, Matillion environment, and Redshift, ensuring smooth data flow. Finally, we will cover strategies for querying and extracting your data. This includes both the initial full data load and recommended practices for subsequent incremental extractions. By the end of this article, you will have a clear roadmap for enabling seamless and performant data movement from eXtremeDB to Redshift.
What is eXtremeDB?
eXtremeDB, developed by McObject, is an embedded in-memory DBMS for resource-constrained environments like IoT, telecom, and real-time systems. It supports in-memory and persistent storage in a hybrid model to balance speed and durability. With a lightweight footprint, native C/C++ API, SQL access, and support for multiple languages, it integrates easily across platforms. eXtremeDB provides high availability, ACID compliance, and specialized modules for time series and analytics, ensuring deterministic, low-latency performance. Its modular design lets developers include only needed features, reducing overhead and meeting strict memory and performance requirements for mission-critical applications.
What is Amazon Redshift?
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud, designed for fast query performance on large datasets. It leverages columnar storage, parallel processing, and advanced compression techniques to efficiently handle analytic workloads typical in business intelligence and data warehousing scenarios. Redshift is compatible with standard SQL and various BI tools, enabling seamless integration into existing data ecosystems. The service features automated backups, scalability through cluster resizing, and supports both on-demand and Reserved Instance pricing models. Furthermore, Redshift Spectrum allows querying data directly from Amazon S3, facilitating a flexible lake house architecture and reducing data movement overhead.
Why Move Data from eXtremeDB into Amazon Redshift
The Benefits of Integrating eXtremeDB Data into Redshift for Advanced Analytics
A data engineer or architect might choose to copy data from eXtremeDB into Amazon Redshift for several compelling reasons. Primarily, eXtremeDB often hosts data that is potentially valuable for organizational analytics, reporting, or decision-making; however, its value is amplified when combined with data from other enterprise sources. By integrating eXtremeDB data within Redshift, professionals can leverage Redshift’s high-performance, scalable analytics capabilities alongside data from a diverse array of systems, facilitating deeper and more holistic insights. Importantly, performing resource-intensive data integration and analysis tasks in Redshift, rather than directly in eXtremeDB, ensures that the transactional workload and real-time responsiveness of eXtremeDB remain unaffected, preserving its performance for mission-critical operational processing.
Similar connectors
Creating an Identity in eXtremeDB
eXtremeDB provides user and role-based authentication to manage access to the database. To create a user (identity), you will execute SQL statements via an appropriate client tool (such as mcsqled, the interactive SQL CLI) or through your application using eXtremeSQL APIs.
Below are the steps and example scripts to create a user in an eXtremeDB database:
1. Connect to the Database
First, connect to your eXtremeDB SQL server using your preferred tool. For example:
bash mcsqled --host <hostname> --port <port> --user admin --password <admin_password>
2. Create a User
Use the standard
CREATE USERSQL statement to define a new user identity. Substitute your desired username and password.
CREATE USER alice IDENTIFIED BY 'a_secure_password';
This creates a user named
alicewith the specified password.
3. Grant Privileges
By default, the new user does not have access to database objects. Grant privileges with SQL's
GRANTstatement:
``` -- Grant usage on the database GRANT USAGE TO alice;
-- Grant permissions on schema objects GRANT SELECT, INSERT, UPDATE, DELETE ON customers TO alice; ```
Replace
customerswith your actual table, and adjust privileges as needed.
4. List Users (Optional)
To verify the current users:
SELECT * FROM INFORMATION_SCHEMA.SYSTEM_USERS;
or for roles (if needed):
SELECT * FROM INFORMATION_SCHEMA.SYSTEM_ROLES;
Important Notes
- The syntax above applies to eXtremeDB SQL (eXtremeSQL) environments with enabled authentication.
- Only a user with sufficient privileges (typically
admin
) can create other users. - For more details, consult the eXtremeDB SQL Reference under "User Identification and Authorization."
Installing the JDBC Driver
The eXtremeDB JDBC driver is not distributed by default with Matillion Data Productivity Cloud, owing to licensing and redistribution restrictions. Therefore, to use eXtremeDB as a data source within Matillion, you must manually acquire and install the relevant JDBC driver.
Step 1: Download the JDBC Driver
Visit the official eXtremeDB download page at https://www.mcobject.com/products/extremedb/.
When downloading, look specifically for the “Type 4” JDBC driver, as this version works natively with Java and requires no native library dependencies, making it best suited for cloud and container environments such as Matillion.
Step 2: Review Vendor Licensing Requirements
Since redistribution is restricted, you may need to accept a license agreement or sign up for access before downloading. Follow the prompts and ensure you understand the licensing terms before proceeding with use in production or evaluation.
Step 3: Install the JDBC Driver in Matillion Data Productivity Cloud
Matillion does not ship the eXtremeDB JDBC driver by default, and users must upload it as an external component. To do this, follow the installation instructions outlined by Matillion at https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/.
This resource provides detailed, up-to-date steps for driver upload, prerequisites, and troubleshooting tips, ensuring a seamless integration process.
Step 4: Configure and Use the eXtremeDB Connection
Once the driver is uploaded and recognized by your Matillion environment, you can reference it when setting up your database connection or configuring database query components. For precise configuration procedures, consult the usage instructions at https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/.
These materials guide you in specifying driver class names, connection URLs, authentication, and relevant parameter settings to ensure reliable connectivity to eXtremeDB.
Checking network connectivity
To establish a successful connection between Matillion Data Productivity Cloud and your eXtremeDB database, you must ensure that your eXtremeDB instance is configured to allow incoming network connections, depending on your deployment scenario:
-
Full SaaS Agent Configuration:
Your eXtremeDB database should allow incoming connections from the static IP addresses used by the Matillion Full SaaS service. The latest list of these IP addresses is available at: Matillion IP Addresses. -
Hybrid SaaS Deployment:
If you are using a Hybrid SaaS deployment, configure your eXtremeDB security settings to allow incoming connections from your own Virtual Private Cloud (VPC). Utilities to help you check network access and discover your egress IP addresses can be found here: Network Access Check Utilities.
Additionally, if you are referencing your eXtremeDB database using a DNS address, ensure that the Matillion Full SaaS or Hybrid SaaS agents can resolve the database’s DNS hostname to its current IP address. Proper DNS resolution is essential for establishing connectivity.
Querying Data from eXtremeDB
This guide explains how to query data from an eXtremeDB database using SQL SELECT statements, considerations when migrating to Redshift, and best practices for loading data in ETL workflows.
1. Example Queries: SQL SELECT Statements
eXtremeDB supports SQL query syntax for retrieving data. Below are sample queries:
``` -- Retrieve all rows and columns from the 'customers' table SELECT * FROM customers;
-- Retrieve specific columns from the 'orders' table SELECT order_id, customer_id, order_date FROM orders;
-- Filter data with a WHERE clause SELECT * FROM orders WHERE order_date >= '2024-01-01' AND status = 'Shipped';
-- Aggregate results with GROUP BY SELECT customer_id, COUNT(*) AS total_orders FROM orders GROUP BY customer_id; ```
eXtremeDB SQL is generally ANSI-compliant, but certain advanced functions or types may differ.
2. Datatype Conversion: eXtremeDB to Redshift
When replicating data from eXtremeDB to Amazon Redshift, datatype conversion may be necessary. For example:
| eXtremeDB Datatype | Equivalent Redshift Datatype |
|---|---|
INTEGER |
INTEGER |
FLOAT |
FLOAT4/ FLOAT8 |
CHAR |
CHAR |
VARCHAR |
VARCHAR |
DATE |
DATE |
TIME |
TIME |
Tip: Validate and map non-standard or custom types for compatibility during ETL.
3. Optimal Data Load Pattern
Initial Load
The recommended best-practice is to execute a full (once-off) initial load of the source data, followed by incremental loads.
- Initial Load: Ingest all records, typically with an unfiltered SQL query.
-- Example initial load query (no filter) SELECT * FROM orders;
Incremental Load
For subsequent refreshes, only changed records are loaded by applying a filter (e.g., using a timestamp or numeric ID).
-- Example incremental load query (using a filter) SELECT * FROM orders WHERE last_modified > '2024-06-01 00:00:00';
- The same Database Query component (e.g., in Matillion ETL) is used for both initial and incremental loads.
- For the incremental load, parameterize your filter on a replication key (
last_modified
, etc.).
Read more: Incremental Load Data Replication Strategy – Matillion Exchange
References: - eXtremeDB SQL Reference (McObject) - AWS Redshift Datatype Mapping
Data Integration Architecture
Loading data into Amazon Redshift in advance of integration is a key strategy for dividing and conquering complex data management challenges, as it allows organizations to split the process into a dedicated data loading step followed by separate integration and transformation. This approach exemplifies the Extract, Load, Transform (ELT) architecture, which differs from traditional ETL by decoupling the extraction and loading of raw data from its subsequent transformation. Data integration inevitably requires transformation to harmonize schema, clean data, and prepare it for analytics; the most effective way to achieve this is through well-defined data transformation pipelines. In the ELT model, these transformations are executed entirely within the target Redshift database, which brings the benefit of rapid, on-demand, and highly scalable processing that leverages Redshift’s native computation power. This eliminates the need for separate, dedicated data transformation infrastructure, reducing overall complexity and costs and allowing organizations to fully harness Redshift’s performance for seamless integration workflows.