Integrate data from Oracle Database to Amazon Redshift using Matillion

Our Oracle Database to Redshift connector seamlessly transfers your data to Redshift within minutes, ensuring it remains current without the need for manual coding or managing complex ETL scripts.

Oracle Database
Amazon Redshift
Oracle Database to Amazon Redshift banner

Extracting data from Oracle Database to Amazon Redshift

Extracting data from Oracle Database is a common yet critical step in building modern analytics pipelines, especially when migrating or synchronizing data to cloud platforms like Amazon Redshift. This article provides a step-by-step guide to streamline this process, with a particular focus on users leveraging Matillion ETL. We will begin by detailing how to create an appropriate identity in the Oracle Database to ensure secure and efficient access for extraction tasks. Next, for those using Matillion, we will explain how to check for—or acquire—the necessary JDBC driver to enable a seamless connection between Oracle and Matillion. The article will also outline the essential network connectivity requirements to ensure reliable, secure communication between your Oracle source database and Amazon Redshift target. Finally, we'll cover the process of querying data: first through a full extraction of initial datasets, and subsequently by implementing incremental data loads for ongoing synchronization. Whether you are undertaking a migration project or automating recurring data integration, this guide will help you navigate the key steps involved.


What is Oracle Database?

Oracle Database is a robust, enterprise-grade relational database management system (RDBMS) developed by Oracle Corporation. Recognized for its scalability, reliability, and advanced feature set, Oracle Database supports both transactional and analytical workloads, and provides comprehensive tools for database administration, security, and backup and recovery. Offering multi-model data management capabilities, it enables users to work with structured, semi-structured, and unstructured data, and is highly optimized for deployment in both on-premises and cloud environments. Advanced features such as Real Application Clusters (RAC), Automatic Storage Management (ASM), and pluggable databases deliver high availability, flexible resource utilization, and simplified management, making Oracle Database a preferred choice for mission-critical applications across diverse industries.

matillion logo x Oracle Database

What is Amazon Redshift?

Amazon Redshift is a fully managed, cloud-based data warehouse built for fast, scalable analytics on large datasets. Using a columnar architecture and Massively Parallel Processing (MPP), it delivers high-performance queries for complex business intelligence. Redshift integrates seamlessly with AWS, supports both structured and semi-structured data via Amazon S3 and Redshift Spectrum, and works with various data and BI tools. Key features include advanced compression, automatic vacuuming, concurrency scaling, and workload management. Redshift enables efficient, cost-effective querying across diverse sources without needing to move your data.

Why Move Data from Oracle Database into Amazon Redshift

Unlocking Advanced Analytics: The Value of Copying Data from Oracle to Redshift

A data engineer or architect might choose to copy data from Oracle Database into Amazon Redshift for several compelling reasons. Firstly, Oracle Database often stores business-critical information that can yield valuable insights when analyzed effectively. Integrating this data with information from other sources—such as transactional systems, log files, or cloud-native applications—enables organizations to gain a more comprehensive and actionable view of their operations. By leveraging Amazon Redshift for this integration and subsequent analysis, teams can fully utilize Oracle’s data without imposing additional workload or performance overhead on the source database. This approach not only preserves Oracle’s primary function for transactional processing but also maximizes the value derived from its data in a scalable, high-performance analytics environment.

Creating an Identity in Oracle Database

Creating a user in Oracle Database involves several steps, including connecting as a privileged user, executing a

CREATE USER
statement, and (optionally) granting privileges. This guide provides concise instructions suitable for DBA and technical audiences.

Prerequisites

  • Oracle Database instance accessible
  • DBA privileges (e.g., access via a user such as
    sys
    or
    system
    , or another user with
    CREATE USER
    privilege)

1. Connect as a Privileged User

Connect to the database as a user with sufficient privileges, usually

sysdba
:

bash
sqlplus sys@ORCL as sysdba

Or for a local connection:

bash
sqlplus / as sysdba

2. Create the User

Replace

<username>
and
<password>
with the desired username and a secure password. Assign a default tablespace (e.g.,
USERS
) and a temporary tablespace (e.g.,
TEMP
).

CREATE USER <username>
  IDENTIFIED BY <password>
  DEFAULT TABLESPACE USERS
  TEMPORARY TABLESPACE TEMP
  QUOTA UNLIMITED ON USERS;

Example:

CREATE USER app_user
  IDENTIFIED BY StrongPwd#2024
  DEFAULT TABLESPACE USERS
  TEMPORARY TABLESPACE TEMP
  QUOTA UNLIMITED ON USERS;

3. Grant Permissions

Typically, new users require at least

CREATE SESSION
to log in. You may also grant object creation privileges, or assign roles like
DBA
(with caution).

``` -- Basic connect privilege GRANT CREATE SESSION TO ;

-- (Optional) Privileges to create objects: GRANT CREATE TABLE, CREATE VIEW, CREATE PROCEDURE, CREATE SEQUENCE TO ; ```

Example:

GRANT CREATE SESSION TO app_user;
GRANT CREATE TABLE, CREATE VIEW TO app_user;

4. (Optional) Grant Roles

For full administrative access (not usually recommended for application users):

GRANT DBA TO <username>;

5. Verify the User

Connect using the new user credentials:

```bash sqlplus app_user@ORCL

or

sqlplus app_user/StrongPwd#2024@ORCL ```

Notes

  • Always use strong, complex passwords for database accounts.
  • Minimize granted privileges—apply the principle of least privilege.
  • If connecting remotely, ensure the listener and network configuration permit access.

Checking the JDBC driver

The JDBC driver for Oracle Database is already installed in Matillion Data Productivity Cloud, so there is no need to download or install it separately. To confirm the presence of the JDBC driver, follow these steps:

  1. Create a new Orchestration pipeline in your Matillion Data Productivity Cloud project.
  2. Add a "Database Query" component to your pipeline.
  3. In the settings for the "Database Query" component, locate the Database Type dropdown menu.
  4. Look for Oracle Database in the list of available database types. If it appears in the dropdown, the JDBC driver is installed and ready to use.

If you need to verify which version of the JDBC driver is currently in use, please refer to the detailed instructions provided here: Database Driver Versions.

Checking network connectivity

To enable successful connectivity between Matillion Data Productivity Cloud and your Oracle Database, you must ensure that the database is configured to allow incoming connections according to the type of Matillion deployment you are using:

  • Full SaaS Agent:
    If you are using a Full SaaS agent configuration, the Oracle Database must allow incoming connections from the IP addresses listed at:
    Matillion Data Productivity Cloud IP Allowlist Documentation

  • Hybrid SaaS Deployment:
    For Hybrid SaaS deployments, incoming connections from your own virtual private cloud (VPC) must be permitted. To assist in checking network access from your VPC, you can use utilities provided by Matillion at:
    Matillion Network Access Checker

Additionally, if your Oracle Database is referenced via a DNS hostname (rather than a direct IP address), ensure that the Full SaaS or Hybrid SaaS agent is able to resolve the hostname to the correct IP address, as DNS resolution is required for connectivity.

Querying Data from an Oracle Database

This guide provides technical instructions for querying data from an Oracle Database using SQL

SELECT
statements, and discusses how to integrate these queries into data loading workflows such as those in Matillion ETL, including datatype conversion and loading strategies.


Oracle Database SQL Query Examples

To query data in Oracle Database, use the standard

SELECT
statement syntax. Here are some common examples:

``` -- Select all columns from an "employees" table SELECT * FROM employees;

-- Select specific columns with a filter SELECT employee_id, first_name, last_name, hire_date FROM employees WHERE department_id = 10;

-- Aggregate function usage example SELECT department_id, COUNT(*) AS employee_count FROM employees GROUP BY department_id;

-- Join two tables SELECT e.employee_id, e.first_name, d.department_name FROM employees e JOIN departments d ON e.department_id = d.department_id; ```


Datatype Conversion: Oracle Database vs Redshift

When loading data from Oracle Database into Amazon Redshift, different database systems' datatypes may require explicit conversion. Examples of common Oracle-to-Redshift datatype mappings include:

Oracle Datatype Redshift Equivalent
NUMBER(p,s)
DECIMAL(p,s)
VARCHAR2(n)
VARCHAR(n)
DATE
DATE
TIMESTAMP
TIMESTAMP
CLOB
VARCHAR(65535)
BLOB
Not directly supported

Careful review of your schema and use of CAST/CONVERT functions in SQL or ETL tools may be necessary to assure compatibility.


Data Load Patterns: Initial and Incremental Loads

The best practice for replicating data from Oracle Database is to use a two-phase data load strategy with the same

Database Query
component: - Initial Load: Perform a one-time extraction to target all existing data. - Incremental Loads: Subsequently, load only changes (new or updated rows) since the last extraction.

Initial Load

On the first run, your

Database Query
should not use any row-based filters; extract the entire dataset:

SELECT * FROM employees;

Incremental Loads

For future loads, filter on a suitable column (typically a timestamp or auto-incrementing key):

SELECT * FROM employees
WHERE hire_date > TO_DATE(:last_extracted, 'YYYY-MM-DD HH24:MI:SS');
Replace
:last_extracted
with the appropriate last extraction value, often managed by your ETL workflow.

For a comprehensive strategy discussion, refer to Matillion Incremental Load Data Replication Strategy.


Note: Ensure the chosen filter column (e.g.,

hire_date
,
last_updated
, or a surrogate key) reliably reflects new/updated data for correct incremental loading. The repeat use of the
Database Query
component, changing only the filter clause, aligns with best practice for Oracle-to-Redshift ETL processes.

Data Integration Architecture

Loading data in advance of integration is a core strategy for dividing and conquering the data integration challenge, by breaking the process into two distinctive steps: loading and then transforming the data. This modular approach forms a key advantage of the ELT (Extract, Load, Transform) architecture, as it allows raw data to be quickly ingested before any complex integration logic is applied. Effective data integration inherently requires robust data transformations, and the most reliable method for achieving this is through the use of structured data transformation pipelines. These pipelines ensure consistency, traceability, and efficiency in processing large datasets. Notably, another significant benefit of the ELT architecture is that all data transformation and integration tasks are performed inside the target Redshift database itself. As a result, transformations are fast, occur on-demand as needed, and scale seamlessly as data volumes grow, all while eliminating the need to invest in, maintain, or manage separate data processing infrastructure.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.