Integrate data from DolphinDB to Snowflake using Matillion

Our DolphinDB to Snowflake connector enables seamless and timely data transfer to Snowflake, automatically keeping your information current without requiring manual coding or intricate ETL processes.

DolphinDB
Snowflake
DolphinDB to Snowflake banner

Extracting data from DolphinDB to Snowflake

Extracting data from DolphinDB is a critical task for organizations seeking to leverage their data assets within modern cloud data warehousing platforms such as Snowflake. Integrating these two platforms enables advanced analytics, reporting, and the consolidation of large data sets for broader business insights. In this article, we will guide you through the essential steps for configuring and executing this data pipeline. The process begins with creating a secure identity in DolphinDB to ensure controlled access to your source data. For those using Matillion as the ETL orchestration tool, we will clarify how to check for an existing JDBC driver, or obtain and configure it if necessary. Establishing robust network connectivity between DolphinDB and Snowflake is also a key prerequisite; we will outline requirements and considerations for ensuring reliable communication between your platforms. Finally, we will detail best practices for querying data—both as an initial extract and through incremental updates—to ensure your synchronization is both efficient and consistent. By following this guide, you'll be well-equipped to extract and load data from DolphinDB into Snowflake, empowering your organization to unlock new value from your data.


What is DolphinDB?

DolphinDB is a high-performance, distributed time-series database designed for efficiently storing, processing, and analyzing large volumes of structured data. Its architecture combines in-memory computing with column-oriented storage to deliver low latency and high throughput, making it particularly well-suited for applications in finance, IoT, and other data-intensive domains. DolphinDB supports a rich, SQL-like query language and provides extensive function libraries for time-series analysis, statistics, and machine learning. It offers seamless horizontal scalability through clustering, robust fault tolerance, and robust integration capabilities via native APIs in languages such as Python, Java, and C++. These features position DolphinDB as a powerful platform for building high-speed, analytics-driven solutions in environments with demanding data workloads.

matillion logo x DolphinDB

What is Snowflake?

Snowflake is a cloud-native data platform designed to efficiently handle large-scale data storage, processing, and analytic workloads. Its architecture separates compute from storage, enabling linear scalability and flexible resource allocation for diverse workloads such as data warehousing, data lakes, and real-time analytics. Snowflake supports ANSI SQL, semi-structured data formats like JSON, Avro, and Parquet, and features robust data sharing capabilities across organizational boundaries via secure data exchanges. The platform provides automatic tuning, clustering, and concurrency management, reducing DBA overhead, and integrates seamlessly with major cloud providers including AWS, Azure, and Google Cloud. This makes Snowflake a compelling choice for teams seeking elastic performance, simplified management, and cross-cloud data collaboration.

Why Move Data from DolphinDB into Snowflake

The Benefits of Integrating DolphinDB Data into Snowflake for Advanced Analytics

A data engineer or architect may wish to copy data from DolphinDB into Snowflake for several compelling reasons. First, DolphinDB often contains data that is potentially valuable for broader organizational analysis or decision-making. By integrating this data with datasets from other sources within Snowflake, organizations can derive deeper insights and unlock new opportunities for value creation. Furthermore, using Snowflake as the platform for data integration enables advanced analytics and reporting capabilities while alleviating the need to run complex queries or transformations directly on DolphinDB. This approach ensures that the performance and stability of the operational DolphinDB system are preserved, as the heavy processing workload is shifted to Snowflake’s scalable and robust architecture.

Creating a User in DolphinDB

This guide outlines the steps required to create a new user (identity) in a DolphinDB database, using SQL scripts and relevant commands.

Prerequisites

  • You must have an account with administrator privileges (
    admin user
    ) in your DolphinDB instance.
  • You should be connected to your DolphinDB server via a client session or interactive console.

Step-by-Step Instructions

1. Log in as an Administrator

Ensure you are logged into DolphinDB as an administrator. Only users with admin privileges can create, modify, or delete users.

2. Create a New User

Use the built-in function

createUser
to create a user. Replace
<username>
and
<password>
with your desired username and password:

``` // Syntax createUser(username, password)

// Example createUser("newuser", "strongpassword123") ```

If you attempt to create a user who already exists, DolphinDB will return an error.

3. Grant Privileges (Optional)

By default, a newly created user has no privileges (except being able to log in). You need to explicitly grant the required privileges. For example, to grant

QUERY
and
WRITE
access on database
/db1
:

// Grant privilege
grant("newuser", "QUERY,WRITE", "/db1")

Accepted privileges include:

QUERY
,
READ
,
WRITE
,
ADMIN
,
MAINTAIN
, and others. You can combine multiple privileges by separating them with a comma. The third argument specifies the object (database, table, or the system root '/').

4. Verify the User

To ensure the user has been created, you may use the

userList()
function:

select * from userList()

This will display all users currently in the DolphinDB instance.

Notes

  • Password can be a plain string. DolphinDB internally encrypts passwords.
  • To change a user’s password, use the
    alterUser()
    function.
  • For more information on user and privilege management, refer to the official DolphinDB documentation.

Installing the JDBC driver

At the time of writing, the DolphinDB JDBC driver is not included by default in Matillion Data Productivity Cloud, primarily due to licensing or redistribution restrictions. Therefore, you must manually download and install the driver to enable DolphinDB connectivity.

Step 1: Download the DolphinDB JDBC Driver

  1. Visit the official DolphinDB download page: http://www.dolphindb.com/downloads.html
  2. Locate the Type 4 JDBC driver for DolphinDB. This type of driver is generally recommended due to its platform independence and direct database communication capabilities.
  3. Download the JDBC driver
    .jar
    file to your local system. Be sure to review any accompanying release notes or documentation for important compatibility and version information.

Step 2: Upload the JDBC Driver to Matillion Data Productivity Cloud

  1. Follow the installation process detailed in Matillion's documentation here: https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/
  2. This guide explains how to upload and register external JDBC driver
    .jar
    files into your Matillion Agent environment.
  3. After uploading the DolphinDB JDBC driver, ensure it appears in the list of available external drivers and is enabled for use.

Step 3: Configure and Use the Driver

  1. Once successfully installed, you can proceed to configure database connections within the Matillion Data Productivity Cloud platform.
  2. Refer to the usage instructions at: https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/
  3. These instructions describe how to set up and test JDBC connections for use within your workflows.

Remember to comply with all licensing terms when downloading and distributing the DolphinDB JDBC driver within your organization.

Checking network connectivity

To ensure successful connectivity between Matillion Data Productivity Cloud and your DolphinDB database, you must configure your database to allow incoming connections based on your deployment type:

Additionally, if you are referencing your DolphinDB database using a DNS name, ensure that the Full SaaS or Hybrid SaaS agent can resolve the DNS address successfully. This is required for establishing the connection.

Querying Data from a DolphinDB Database

This guide provides technical instructions on querying data from a DolphinDB database, with examples of SQL SELECT statements and best practices for data loading patterns. This approach is relevant when integrating DolphinDB with external systems such as Snowflake, where type conversions and incremental loading strategies may apply.


SQL SELECT Query Examples in DolphinDB

DolphinDB supports both its own query syntax and an ANSI SQL-ish subset. Here are some common SQL SELECT statement examples:

``` -- Selecting all columns and rows from a table SELECT * FROM trades;

-- Selecting specific columns SELECT symbol, price, volume FROM trades;

-- Filtering data with a WHERE clause SELECT symbol, price FROM trades WHERE volume > 10000;

-- Aggregation example SELECT symbol, AVG(price) as avg_price FROM trades WHERE trade_date BETWEEN 2023.01.01 AND 2023.06.30 GROUP BY symbol;

-- Ordering results SELECT * FROM trades ORDER BY trade_date DESC LIMIT 10; ```


Datatype Conversion between DolphinDB and Snowflake

When transferring data between DolphinDB and Snowflake, keep in mind the possibility of datatype conversion. For instance:

  • DolphinDB's
    DATE
    type may map to
    DATE
    in Snowflake.
  • TIMESTAMP
    formats might need parsing.
  • Numeric datatypes (
    INT
    ,
    FLOAT
    ,
    DOUBLE
    ) are usually compatible but confirm scale and precision.
  • String encodings should be checked for compatibility (UTF-8 is standard in both).

Careful review and testing of datatype mapping is crucial to avoid unexpected issues during data replication.


Best Pattern: Initial Load and Incremental Loads

The most effective approach for synchronizing DolphinDB data to data warehouses (e.g., Snowflake) is:

  1. Initial Load
  2. Extract all data at once.
  3. Typically performed only once per table.
  4. Use a Database Query component without a filter clause to select all records.
  5. Incremental Load
  6. Extract only new or changed records since the last successful extraction.
  7. Use the same Database Query component, this time with a filter clause (e.g., on an
    updated_at
    or
    timestamp
    column).

This strategy ensures efficient extraction and reduces ongoing data transfer costs. See details at Matillion Exchange: Incremental Load Data Replication Strategy.

Example Patterns

Initial Load (No Filter Clause)

SELECT * FROM trades

Incremental Load (With Filter Clause)

Let's suppose

last_extracted_timestamp
is a parameter:

SELECT * FROM trades
WHERE trade_time > '${last_extracted_timestamp}'

Both approaches utilize the same Database Query component, with the filter clause applied dynamically for incremental loads.


For more on incremental replication, refer to Matillion's documentation.

Data Integration Architecture

An essential advantage of the ELT (Extract, Load, Transform) architecture is its ability to split the data integration process into two discrete steps by first loading data into Snowflake in advance of integration. This "divide and conquer" approach simplifies the overall workflow, allowing organizations to ingest large volumes of raw data quickly and address complex transformation requirements separately. Data integration inherently involves various data transformations—such as cleansing, standardization, and enrichment—and the most effective way to perform these tasks is through dedicated data transformation pipelines. With ELT, these transformations occur natively within the Snowflake database, leveraging its robust processing power. This design provides several benefits: data transformation and integration are executed quickly and on-demand, the process is highly scalable to match evolving workloads, and organizations do not need to invest in or maintain separate data processing infrastructure, resulting in both operational efficiency and cost savings.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.