Integrate data from MemSQL (SingleStore) to Amazon Redshift using Matillion

Our SingleStore to Redshift connector enables seamless, real-time data transfer to Redshift within minutes, eliminating the need for manual coding or complex ETL processes.

SingleStore
Amazon Redshift
MemSQL (SingleStore) to Amazon Redshift banner

Extracting data from MemSQL (SingleStore) to Amazon Redshift

Extracting data from SingleStore is a common requirement for organizations seeking to leverage the analytical capabilities of Amazon Redshift. Moving data between these two platforms can support advanced reporting, machine learning initiatives, and data consolidation efforts. In this article, we will guide you through the key steps involved in this process. We will begin by outlining how to create an identity in SingleStore to securely access your source data. For users leveraging Matillion for orchestration, we will detail how to check or acquire the required JDBC driver to establish connectivity. Next, we’ll discuss best practices for configuring network connectivity between SingleStore and Redshift, addressing security and performance considerations. Finally, we’ll cover techniques for querying data in both initial full-load and incremental modes, ensuring efficient synchronization between your operational and analytical environments. By the end of this article, you will be equipped with the knowledge necessary to reliably migrate and synchronize data from SingleStore to Redshift.


What is MemSQL (SingleStore)?

SingleStore is a distributed SQL database designed for high-performance, real-time analytics and transactional workloads. Combining in-memory rowstore and disk-based columnstore capabilities, SingleStore supports mixed workloads on a unified engine, enabling both OLTP and OLAP operations with minimal latency. It features horizontal scalability, strong consistency through snapshot isolation, and native support for standard SQL, including window functions and geospatial queries. With its lock-free data structures and vectorized query execution, SingleStore optimizes for both streaming and batch data processing. The platform supports cloud-native deployments, Kubernetes orchestration, and integration with data ecosystems via standard drivers and connectors.

matillion logo x SingleStore

What is Amazon Redshift?

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the AWS cloud, optimized for high-performance analytic queries on large datasets. Leveraging columnar storage, parallel processing, and compressed data formats, Redshift minimizes I/O and enables efficient querying, making it well-suited for OLAP (Online Analytical Processing) workloads and complex reporting. By supporting standard SQL and Postgres extensions, Redshift facilitates seamless integration with existing business intelligence tools and ETL pipelines. Notably, Redshift’s architecture allows users to scale compute and storage resources independently, enhancing flexibility and cost efficiency. Features such as concurrency scaling, materialized views, and AQUA (Advanced Query Accelerator) further optimize query performance for demanding analytics use cases.

Why Move Data from MemSQL (SingleStore) into Amazon Redshift

The Case for Replicating SingleStore Data to Redshift for Advanced Analytics

A data engineer or architect may choose to copy data from SingleStore into Redshift for several compelling reasons. Firstly, SingleStore often contains data that is potentially valuable for business analytics or operational reporting. However, the true value of this data is often realized when it is integrated with information from other sources; combining SingleStore data with datasets from additional platforms can yield deeper insights and foster more comprehensive analysis. By transferring the data into Redshift, which is purpose-built for scalable analytics and warehousing, organizations can perform complex integrations and analytical workloads without imposing additional demand on their operational SingleStore environment. This approach not only preserves the performance of SingleStore for transactional workloads but also leverages Redshift's strengths in managing large-scale data integration and analysis.

Creating a User in SingleStore

To create a user in a SingleStore database, you typically use the

CREATE USER
SQL statement. You can customize privileges and authentication methods as needed. The following steps guide you through the process.

1. Connect to the Database

Start by connecting to your SingleStore instance using a SQL client or the command line:

mysql -u root -p -h <hostname> -P <port>

Replace

<hostname>
and
<port>
with your SingleStore host and port information.

2. Create the User

Execute the following SQL command to create a new user:

CREATE USER 'new_user'@'%' IDENTIFIED BY 'StrongPassword!';

  • Replace
    'new_user'
    with the desired username.
  • The
    '%'
    wildcard allows the user to connect from any host. For increased security, you may specify a particular IP address or hostname instead.
  • Replace
    'StrongPassword!'
    with a securely chosen password.

Example

CREATE USER 'analytics'@'10.20.30.40' IDENTIFIED BY 'Comp!exPa$$w0rd';

This command creates the user

analytics
who can connect only from the host with IP address
10.20.30.40
.

3. Grant Privileges (Optional)

By default, the user has no permissions. Grant appropriate privileges with the

GRANT
statement:

GRANT SELECT, INSERT, UPDATE ON database_name.* TO 'new_user'@'%';

To grant all privileges on a database:

GRANT ALL PRIVILEGES ON database_name.* TO 'new_user'@'%';

4. Apply Privilege Changes

After modifying privileges, run the following command to flush the privileges (optional, typically not required in SingleStore, but safe to include if integrating with legacy procedures):

FLUSH PRIVILEGES;

5. Verify User Creation

To list all users:

SELECT User, Host FROM INFORMATION_SCHEMA.USER_ATTRIBUTES;

The new user should appear in the results.


For more information, see the SingleStore documentation on user management.

Installing the JDBC Driver

At the time of writing, the SingleStore JDBC driver is not included by default in the Matillion Data Productivity Cloud due to licensing or redistribution restrictions. As a result, you must manually download and install the driver in order to connect Matillion to your SingleStore database.

1. Downloading the JDBC Driver

Navigate to the official SingleStore documentation at:
https://docs.singlestore.com/managed-service/en/reference/jdbc-driver.html

Here you will find download links for the SingleStore JDBC driver. When choosing a driver, prioritize a Type 4 JDBC driver, as this is a pure Java implementation and is best suited for JDBC-compatible applications like Matillion.

2. Preparing for Installation

Download the JDBC driver

.jar
file and store it in a secure location, noting its version for future reference. Ensure that you have the appropriate permissions to upload external drivers in your Matillion environment.

3. Uploading the JDBC Driver to Matillion Data Productivity Cloud

Instructions for installing external JDBC drivers into Matillion can be found at:
https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/

Follow these steps as described in the documentation:

  • Access your Matillion instance and navigate to the section for managing or uploading external drivers within your Matillion agent.
  • Use the Matillion interface to upload the previously downloaded SingleStore JDBC
    .jar
    file.
  • Ensure the driver appears in the list of available drivers and that it is correctly recognized by the platform.

If the Matillion environment is managed by your IT or DevOps team, you may need to coordinate with them to complete the installation.

4. Connecting to SingleStore Using Matillion

After the driver is successfully uploaded and installed, consult the detailed instructions for configuring and using database connectivity within Matillion at:
https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/

This documentation covers how to establish database connections, specify driver properties, and build queries using Matillion's interface to connect to your SingleStore instance.

By following these steps, you can enable SingleStore connectivity in your Matillion Data Productivity Cloud environment.

Checking network connectivity

To ensure successful integration between Matillion Data Productivity Cloud and your SingleStore database, you must allow incoming connections to your SingleStore instance based on your deployment type:

  • Full SaaS agent configuration:
    Configure your SingleStore firewall or network security settings to permit incoming connections from the set of IP addresses used by Matillion's SaaS agents. The current list of these IP addresses can be found here: Matillion SaaS allowed IP addresses.

  • Hybrid SaaS deployment:
    In this scenario, your Matillion agent operates within your own environment, typically inside a virtual private cloud (VPC). You must configure the SingleStore database to accept incoming connections from the relevant VPC IP ranges. To verify your setup or to troubleshoot connectivity, you can use the network access check utilities provided by Matillion: Matillion network access utilities.

Note:
If your SingleStore database is referenced using a DNS name (instead of a static IP address), ensure that the Matillion agent (whether Full SaaS or Hybrid SaaS) is able to resolve the DNS address to the correct IP. This may involve configuring DNS resolution within your environment or network as required.

Querying Data from a SingleStore Database

This guide outlines how to execute SQL queries against a SingleStore database, manage datatype conversion with Redshift, and implement efficient data extraction patterns—focusing on initial and incremental data loads.


1. Example: Query Data using SQL SELECT Statements

You can query data from SingleStore using standard SQL syntax. Below are some example queries:

``` -- Select all columns from the 'customers' table SELECT * FROM customers;

-- Get specified columns from the 'orders' table SELECT order_id, customer_id, amount, status FROM orders;

-- Filter results with a WHERE clause SELECT product_id, product_name, price FROM products WHERE price > 100 AND in_stock = TRUE;

-- Aggregate data SELECT region, COUNT(*) AS total_customers FROM customers GROUP BY region; ```

2. Datatype Conversion: SingleStore & Redshift

When moving data between SingleStore and Amazon Redshift, keep in mind that datatype conversion may be required. Common considerations include:

SingleStore Type Comparable Redshift Type
VARCHAR
VARCHAR
BIGINT
BIGINT
DATETIME
TIMESTAMP
BOOLEAN
BOOLEAN
FLOAT
DOUBLE PRECISION

Ensure any transformations account for these differences prior to or during extraction/ingestion.

3. Best Practice: Initial vs Incremental Data Loads

For both efficiency and data quality, it is best to design your process as:

  1. Once-off initial load – extracts all data, typically performed only the first time.
  2. Recurring incremental loads – extract only new or modified data since the last load.

You can use the same Matillion "Database Query" component for both steps.

A. Initial Load (No Filter)

During an initial load, query the entire table—no filtering clause is necessary:

SELECT *
FROM orders;

B. Incremental Load (With Filter)

Subsequent loads use a filter condition (often on a timestamp or an auto-incrementing ID) to retrieve only new or changed rows:

SELECT *
FROM orders
WHERE updated_at > '2024-06-01 00:00:00';

Use your replication checkpoint value in the filter clause. This pattern improves efficiency by minimizing scanned and transferred data.

Tip: For further guidance on incremental load strategies using Matillion, refer to:
Incremental Load Data Replication Strategy – Matillion Exchange

Data Integration Architecture

Loading data into Amazon Redshift in advance of integration exemplifies the "divide and conquer" strategy by separating the overall process into distinct stages—first, raw data ingestion (load), followed by integration (transform). This approach is a key advantage of the ELT (Extract, Load, Transform) architecture, as it allows organizations to decouple the heavy-lifting of data ingestion from the complexity of integration logic. Effective data integration relies on transformation, and the most robust way to achieve this is through the use of data transformation pipelines that orchestrate, schedule, and manage each step. With ELT, all transformations and integration processes take place directly within the Redshift database itself. This provides fast, on-demand, and highly scalable data processing, eliminating the need for separate transformation servers and allowing organizations to avoid additional costs and complexity associated with external data-processing infrastructure.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.