Integrate data from DolphinDB to Amazon Redshift using Matillion

Our DolphinDB to Redshift connector streamlines the process of transferring your data to Redshift within minutes, maintaining up-to-date information seamlessly without the need for manual coding or managing intricate ETL scripts.

DolphinDB
Amazon Redshift
DolphinDB to Amazon Redshift banner

Extracting data from DolphinDB to Amazon Redshift

Extracting data from DolphinDB is an essential step for many organizations looking to integrate their high-performance time series data with cloud-based analytics platforms such as Amazon Redshift. This article provides a step-by-step guide for successfully transferring data from DolphinDB to Redshift, focusing on configuration and best practices. We will begin by outlining how to create an identity within DolphinDB, ensuring secure and proper access to the data. For users working with Matillion, we will detail how to check for, or acquire, the correct JDBC driver for DolphinDB to facilitate seamless connectivity. Next, we will address critical considerations around establishing and verifying network connectivity between your DolphinDB source and Redshift target environments. Finally, we will demonstrate methods for querying and extracting your data—whether you are performing a full initial load or implementing an incremental extract to keep Redshift up to date. By following this guide, you will be equipped with the knowledge required to establish a reliable and efficient integration pipeline from DolphinDB to Redshift.


What is DolphinDB?

DolphinDB is a high-performance, distributed time series database designed for large-scale data analytics, particularly in financial and scientific domains. Built with a columnar storage architecture and integrated computation engine, DolphinDB allows users to efficiently ingest, store, process, and analyze massive volumes of time-stamped data. Its native support for vectorized operations, in-memory computing, and a proprietary scripting language (resembling SQL and Python) enables fast, flexible querying and complex analytics. DolphinDB offers horizontal scalability through clustering, making it well-suited for mission-critical workloads such as algorithmic trading, risk analysis, and sensor data monitoring, while providing compatibility with common data formats and APIs for seamless integration with existing pipelines.

matillion logo x DolphinDB

What is Amazon Redshift?

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud, designed for fast query performance on large datasets. Built atop PostgreSQL, Redshift extends its capabilities with a columnar storage architecture and parallel processing, enabling efficient analytical queries over vast amounts of structured data. Redshift integrates seamlessly with AWS's ecosystem, supporting advanced security features, automated backups, and scaling via clusters and concurrency scaling. Its Massively Parallel Processing (MPP) engine, combined with data compression and distribution strategies, allows technical teams to optimize storage, reduce query latency, and efficiently serve data warehousing workloads for business intelligence and reporting applications.

Why Move Data from DolphinDB into Amazon Redshift

Key Benefits of Transferring Data from DolphinDB to Redshift for Advanced Analytics

A data engineer or architect might choose to copy data from DolphinDB into Redshift for several compelling reasons. DolphinDB often contains valuable datasets, such as high-frequency trading or time-series financial data, that can support a broad range of analytical and business intelligence initiatives. By transferring this data into Redshift, it becomes possible to integrate and correlate DolphinDB's rich data with information from other enterprise sources, thereby unlocking deeper insights and facilitating more comprehensive analytics. Furthermore, performing such data transformations within Redshift helps alleviate computational demands on DolphinDB, ensuring that DolphinDB continues to operate efficiently as a transaction or ingestion engine, while Redshift takes on the heavier workloads associated with analytics, reporting, and cross-source data integration.

Creating a User in DolphinDB

To manage access control, DolphinDB supports user authentication and authorization. Users are assigned different roles and permissions to interact with the database. Below are step-by-step instructions for creating a user in DolphinDB.

1. Ensure Permissions

Only users with the admin role or sufficient privileges can create new users. Make sure you are logged in as an administrator.

2. Start DolphinDB Server with Authentication Enabled

Edit the server configuration (e.g., dolphinDB.cfg):

enableAuthentication=true

Restart the DolphinDB server if you made changes.

3. Connect to DolphinDB

Establish a session to the DolphinDB server either through the web interface, DolphinDB Studio, or by using a DolphinDB client API.

4. Create a User Account

Execute the following script to create a new user:

createUser("username", "password")

Parameters:

  • username
    : The new user's unique identifier (string).
  • password
    : The password for the new user (string).

Example

createUser("alice", "securePassword123")

This command creates a user named

alice
with the password
securePassword123
.

5. Assign Roles or Permissions

By default, a new user does not have elevated permissions. Grant additional roles as needed:

grantRole("alice", "developer")

Possible roles include:

admin
,
developer
, and
reader
. You may grant multiple roles if necessary.

6. Confirm User Creation

You can verify the list of existing users with:

getUserNames()

To check a specific user's roles:

getUserPermittedRoles("alice")


Note:
- Use strong passwords for all users. - Keep all user management operations restricted to trusted administrators.

Installing the JDBC driver

At the time of writing, the JDBC driver required to connect to DolphinDB is not bundled with Matillion Data Productivity Cloud. This is due to licensing and redistribution restrictions, meaning that you'll need to obtain and install the driver manually before connecting to DolphinDB from your Matillion environment.

Follow the steps below to properly download and install the JDBC driver:

1. Download the DolphinDB JDBC Driver

  1. Navigate to the official DolphinDB download page:
    http://www.dolphindb.com/downloads.html
  2. Locate and download the Type 4 JDBC driver for DolphinDB. Type 4 drivers are fully implemented in Java and offer better compatibility and performance for cloud environments such as Matillion Data Productivity Cloud.
  3. Save the JAR file to your local system. Make note of the location, as you will need it for the next step.

2. Upload the Driver to Matillion Data Productivity Cloud

Once you have the JDBC driver JAR file, follow Matillion's process for adding external JDBC drivers:

  1. Visit the Matillion documentation for uploading drivers:
    Uploading External Drivers
  2. Follow the step-by-step guidance to upload your downloaded DolphinDB JDBC driver into your Matillion Data Productivity Cloud environment.
  3. Ensure that the upload completes successfully and the driver is enabled for use within your organization’s Matillion account.

3. Configure and Use the Driver

After the JDBC driver is installed, you are ready to connect to DolphinDB within Matillion:

  1. Review Matillion’s instructions for using database query components:
    Database Query Component Usage Instructions
  2. Use the Matillion interface to configure a connection to your DolphinDB database, referencing the newly uploaded JDBC driver as the connection method.
  3. Test your connection to ensure authentication and connectivity are working as expected.

By completing these steps, you enable secure and licensed connectivity between Matillion Data Productivity Cloud and your DolphinDB database using the preferred Type 4 JDBC driver.

Checking network connectivity

To successfully connect Matillion Data Productivity Cloud to your DolphinDB database, you must ensure that incoming connections to DolphinDB are allowed based on your deployment configuration:

Full SaaS Agent Configuration:
If you are using the Full SaaS agent, you must allow incoming connections from the specific IP addresses used by Matillion’s cloud service. The required IP addresses are maintained at the following location: Matillion Allowed IP Addresses. Make sure your DolphinDB instance and any associated network security groups (such as firewall rules or access control lists) permit access from these addresses.

Hybrid SaaS Deployment:
In a Hybrid SaaS deployment, your Matillion agent runs inside your own infrastructure. In this case, you must configure your DolphinDB database to allow incoming connections from your own virtual private cloud (VPC) or internal network where the Matillion agent is running. To verify connectivity from within your VPC, you can use the network access check utilities available here: Matillion Network Access Checker.

DNS Resolution:
If your DolphinDB database is referenced using a DNS name (rather than a direct IP address), ensure that the Matillion Full SaaS or Hybrid SaaS agent can resolve the hostname to the correct IP address. Proper DNS configuration is required for establishing a network connection to your DolphinDB instance.

Querying Data from a DolphinDB Database

This guide provides instructions for querying data from a DolphinDB database, including example queries, considerations for datatype conversion (especially with Redshift), and recommended patterns for initial and incremental data loading.


Querying Data with DolphinDB SQL Statements

DolphinDB supports SQL-like queries for retrieving data. Below are some example queries:

``` -- Select all columns from a table SELECT * FROM trades

-- Select specific columns with filtering conditions SELECT symbol, price, volume FROM trades WHERE price > 100

-- Aggregate data by date SELECT tradedate, SUM(volume) AS total_volume FROM trades GROUP BY tradedate

-- Join two tables SELECT a.symbol, a.price, b.company_name FROM trades AS a JOIN companies AS b ON a.symbol = b.symbol ```

These queries can be executed using a compatible database client or through an ETL tool with a Database Query component.


Datatype Conversion Considerations

When moving data between DolphinDB and Redshift, automatic or manual datatype conversion may occur. Here are common considerations:

  • Integers: DolphinDB
    int
    ,
    long
    map to Redshift
    INTEGER
    ,
    BIGINT
  • Decimals: DolphinDB
    double
    maps to Redshift
    DOUBLE PRECISION
  • Strings: DolphinDB
    string
    maps to Redshift
    VARCHAR
  • Timestamps/Dates: Ensure DolphinDB
    datetime
    ,
    timestamp
    columns are compatible with Redshift's
    TIMESTAMP
    or
    DATE
    types

Carefully review the schema mappings to avoid data truncation or type mismatch errors during data ingestion.


Loading Patterns: Initial and Incremental Loads

The best practice for loading data is to perform a once-off initial load, followed by incremental loads for ongoing synchronization. Both patterns can use the same Database Query component, adjusting only the query logic:

1. Initial Load

  • Approach: Extract the entire dataset from DolphinDB.
  • Query Example (No filter clause):

SELECT * FROM trades

  • Use this pattern only the first time, or when a full re-extract is needed.

2. Incremental Load

  • Approach: Extract only new or changed records since the previous load.
  • Query Example (Includes an incremental filter):

SELECT * FROM trades
    WHERE last_updated >= '2024-06-01 00:00:00'

  • Replace the timestamp with a dynamic parameter reflecting the last successful load.

For further reading, refer to the Matillion article on Incremental Load Data Replication Strategy.


Tip: Use the same Database Query component for both initial and incremental loads—just update the WHERE clause as needed for incremental replication.

Data Integration Architecture

Loading data into Amazon Redshift in advance of integration represents a "divide and conquer" strategy that separates the overall task into two manageable steps. This is a key benefit of the ELT (Extract, Load, Transform) architecture, where raw data is first ingested into the target system and then transformed as needed. Data integration inherently involves transforming the loaded datasets, and leveraging data transformation pipelines is recognized as the most effective approach to orchestrate and automate these transformations. A distinct advantage of the ELT methodology is that all transformation and integration activities are performed within the Redshift environment itself. This in-database processing offers significant benefits: it is fast, can be executed on-demand, and scales with Redshift’s computational resources, thus eliminating the need to invest in or manage separate data processing infrastructure.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.