Integrate data from Exasol to Amazon Redshift using Matillion

Our Exasol to Redshift connector enables seamless, automatic data transfer to Redshift within minutes, ensuring your information remains current without the need for manual coding or complex ETL processes.

Exasol
Amazon Redshift
Exasol to Amazon Redshift banner

Extracting data from Exasol to Amazon Redshift

Extracting data from Exasol is a key step for organizations seeking to leverage Amazon Redshift’s powerful analytics and scalability. Efficiently transferring data between these environments ensures that your reporting, analytics, and downstream processing workflows remain up-to-date and reliable. In this article, we will guide you through the essential steps required to extract data from Exasol and load it into Redshift. We will cover creating a user identity in Exasol to enable secure data access, and provide advice for Matillion users on acquiring or checking the appropriate JDBC driver required to connect to Exasol. Additionally, we’ll discuss important network connectivity considerations to ensure smooth data flow between the source and the target systems. Finally, we’ll outline methods for querying and transferring data—whether performing an initial full load or capturing incremental changes for ongoing updates. By following these steps, you will be well-prepared to establish a robust and efficient data migration pipeline between Exasol and Redshift.


What is Exasol?

Exasol is a high-performance, in-memory, columnar MPP relational database designed for analytics and data warehousing. Its distributed architecture enables rapid, scalable query execution on large datasets. Exasol uses advanced in-memory processing, intelligent indexing, and metadata management to deliver sub-second analytic queries. It supports standard SQL, popular BI tools, and user-defined functions in Python, R, Java, and Lua. Exasol emphasizes simple deployment, straightforward administration, and ACID compliance, making it ideal for mission-critical analytics.

matillion logo x Exasol

What is Amazon Redshift?

Amazon Redshift is a fully managed, cloud-based data warehouse service within the AWS ecosystem, designed to efficiently handle large-scale data analytics workloads. It utilizes columnar storage and massively parallel processing (MPP) to accelerate query performance on datasets ranging from gigabytes to petabytes. Redshift supports standard SQL and integrates seamlessly with a variety of ETL tools, BI applications, and AWS services. Advanced features such as workload management, data encryption at rest and in transit, automatic backups, flexible scaling, and support for both structured and semi-structured data (via Redshift Spectrum) make it a robust choice for organizations requiring scalable, high-performance analytics capabilities in the cloud.

Why Move Data from Exasol into Amazon Redshift

The Benefits of Copying Data from Exasol to Redshift for Advanced Analytics

Data engineers and architects may find it advantageous to copy data from Exasol into Amazon Redshift for several reasons. Firstly, Exasol often houses data that holds significant business value, particularly in analytical contexts. However, the full potential of this data is more likely to be realized when it is integrated with information from other sources, forming a more comprehensive dataset for analysis or reporting. By transferring the Exasol data into Redshift, professionals can perform such data integration within Redshift’s scalable environment, thereby avoiding any additional computational load on the Exasol system. This approach not only preserves the performance of Exasol for its primary workloads but also leverages Redshift’s strengths in handling diverse, large-scale analytical queries across multiple datasets.

Creating a User in Exasol Database

To create a user (identity) in Exasol, you require sufficient privileges (typically

CREATE USER
or
DBA
). Users are important for resource isolation, security, and audit tracking within the database. The following instructions detail how to create a user with a specified password and optional permissions.

Prerequisites

  • You must connect to the Exasol database as a user with sufficient privileges (e.g., the
    SYS
    user or another user with
    CREATE USER
    permission).

Example SQL Script to Create a User

-- Create a new user 'new_user' with the password 'StrongP@ssw0rd'
CREATE USER new_user IDENTIFIED BY 'StrongP@ssw0rd';

Note:
- Exasol's password policy may enforce complexity requirements (length, characters, etc.). - Adjust

new_user
and the password as appropriate for your environment.

Granting Privileges to the New User

After creating the user, you should grant the necessary privileges, such as SESSION or object-level privileges.

``` -- Grant the user permission to start a database session GRANT CREATE SESSION TO new_user;

-- Optionally, grant access to a specific schema GRANT SELECT, INSERT, UPDATE, DELETE ON SCHEMA your_schema TO new_user; ```

Replace

your_schema
with the relevant schema name. You can tailor permissions based on the principle of least privilege.

Changing a User's Password (Optional)

To enforce password updates or assist users in resetting passwords:

-- Change the password for an existing user
ALTER USER new_user IDENTIFIED BY 'NewStrongerP@ssw0rd';

Dropping a User (Optional)

To remove a user when no longer needed:

-- Drop the user from the database
DROP USER new_user;


With these instructions, you can create, configure, and manage users within the Exasol database environment using SQL scripts.

Installing the JDBC driver

The Exasol JDBC driver is required to connect Matillion Data Productivity Cloud to your Exasol database. Due to licensing and redistribution restrictions, the Exasol JDBC driver is not bundled with the product by default. You will need to manually obtain and install the driver as follows:

1. Download the Exasol JDBC Driver

  1. Visit the official Exasol JDBC Drivers download page:
    https://www.exasol.com/portal/display/DOWNLOADS/JDBC+Drivers
  2. On the download page, locate the Type 4 JDBC driver for your desired version. The Type 4 driver is preferred, as it is a "pure Java" solution and compatible with most Java-based applications including Matillion.
  3. Download the appropriate
    .jar
    file for the JDBC driver and make sure to note the location where you saved it.

2. Upload and Install the Driver Within Matillion

  1. Refer to Matillion's official documentation for uploading external JDBC drivers:
    https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/
  2. Follow the documented procedure to upload the downloaded
    .jar
    file to your Matillion agent or runtime environment.
  3. Ensure you have the necessary permissions to upload external files or drivers.
  4. Restart the relevant Matillion components or agent if requested, to recognize the newly added driver.

3. Configuring and Using the Driver

By following these steps, you will add and enable the Exasol JDBC Type 4 driver for use in your Matillion Data Productivity Cloud environment.

Checking network connectivity

To enable communication between Matillion Data Productivity Cloud and your Exasol database, you must ensure that the Exasol database allows incoming connections, depending on your deployment type:

Additionally, if your Exasol database is referenced using a DNS name instead of a direct IP address, ensure that the Full SaaS or Hybrid SaaS agent is able to resolve the DNS address of the Exasol database. This may require appropriate DNS configuration or firewall permissions for DNS traffic.

Querying Data from Exasol Database

This guide explains how to query data from an Exasol database, including best practices for initial and incremental loads, as well as an overview of SQL syntax and considerations for datatype conversion when moving data to Redshift.


Query Examples

Below are several basic SQL

SELECT
statements that you can use to query data from Exasol:

``` -- Select all columns and rows from a table SELECT * FROM schema_name.table_name;

-- Select specific columns SELECT id, name, created_at FROM schema_name.table_name;

-- Filtering data with a WHERE clause SELECT * FROM schema_name.table_name WHERE status = 'active';

-- Aggregate data with GROUP BY SELECT status, COUNT(*) AS record_count FROM schema_name.table_name GROUP BY status; ```


Datatype Conversion: Exasol vs. Redshift

Be aware that data types in Exasol may not have direct equivalents in Amazon Redshift. Common conversions to consider are:

Exasol Type Typical Redshift Equivalent
VARCHAR VARCHAR
CHAR CHAR
DECIMAL NUMERIC
DOUBLE DOUBLE PRECISION
DATE DATE
TIMESTAMP TIMESTAMP
BOOL BOOLEAN

Carefully review your schema for compatibility, as implicit or explicit datatype conversions may be necessary during ELT/ETL processes.


Best Pattern for Loading Data

Recommended Strategy:
Perform a once-off full (initial) load, followed by periodic incremental loads for ongoing synchronization. Both approaches can use the same Database Query component.

1. Initial Load

  • Description: Fetch all rows from the source table.
  • Query Pattern: No filter on the source data.

SELECT *
FROM schema_name.table_name;

2. Incremental Loads

  • Description: Retrieve only new or updated records since the last load, improving performance and reducing data volume.
  • Query Pattern: Add a filter clause (typically on a timestamp or ID column).

SELECT *
FROM schema_name.table_name
WHERE updated_at > '${LAST_SUCCESSFUL_LOAD_TIMESTAMP}';

Replace

updated_at
with your table's column for detecting changes, and
${LAST_SUCCESSFUL_LOAD_TIMESTAMP}
with the appropriate reference in your environment.

Tip: See the full incremental loading pattern and best practices at Matillion Exchange: Incremental Load / Data Replication Strategy.


Using the Database Query Component

  • Use the same Database Query component for both initial and incremental data loads.
  • For the initial load, use an unfiltered query.
  • For incremental loads, include the filter (WHERE clause) as demonstrated above.

This unified approach simplifies orchestration and maintenance of your data pipelines.

Data Integration Architecture

Loading data into Amazon Redshift in advance of integration offers a pragmatic way to tackle the challenges of data management by breaking the process into two separate and manageable steps, namely loading and transformation. This approach exemplifies a key advantage of the Extract, Load, and Transform (ELT) architecture. In the context of data integration, transforming data to match business requirements or data models is essential, and the most effective way to achieve this is through structured data transformation pipelines. These pipelines allow organizations to automate complex transformations, ensuring consistency and reliability. Additionally, another significant benefit of the ELT architecture is that both data transformation and integration processes are executed directly within the Redshift environment. This not only leverages the performance and scalability of the target database, but also delivers transformations quickly and on demand, without the need to invest in and maintain separate data processing infrastructure.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.