Integrate data from Teradata to Amazon Redshift using Matillion

Our Teradata to Redshift connector enables seamless data transfer to Redshift within minutes, ensuring your information remains current without the need for manual coding or intricate ETL processes.

Teradata
Amazon Redshift
Teradata to Amazon Redshift banner

Extracting data from Teradata to Amazon Redshift

Extracting data from Teradata is a common requirement for organizations migrating workloads or integrating with cloud-based data platforms. Amazon Redshift offers a scalable and high-performance destination, but a successful transfer entails several preparatory steps and careful planning. In this article, we will guide you through the key stages of the migration process. We will begin by outlining how to create an identity in Teradata, ensuring that you have the necessary privileges for data access and extraction. For Matillion users, we will explain how to check for the presence of—and, if necessary, acquire—the appropriate JDBC driver, which enables seamless connectivity between Matillion, Teradata, and Redshift. Next, we will discuss how to confirm and configure the required network connectivity between your on-premises or existing cloud environments and your Redshift target. Finally, we will address strategies for querying and transferring data, exploring both initial full loads and techniques for handling ongoing incremental updates. By the end of this article, you will have a clear understanding of the essential steps involved in moving data efficiently and securely from Teradata to Redshift.


What is Teradata?

Teradata is a massively parallel processing (MPP) relational database management system designed to handle large-scale data warehousing applications and real-time analytics. Built on a shared-nothing architecture, it distributes both data and workload evenly across multiple nodes, ensuring high scalability, performance, and fault tolerance. Teradata supports standard SQL and integrates with a variety of big data and business intelligence tools, enabling complex queries, advanced analytics, and robust data integration. Its flexible deployment options—on-premises, cloud, or hybrid—allow organizations to tailor solutions for data-driven decision-making and operational efficiency. With optimizations for partitioning, compression, and workload management, Teradata remains a leading choice for enterprises managing high-volume, mission-critical data workloads.

matillion logo x Teradata

What is Amazon Redshift?

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud, designed for fast querying and analysis of large datasets. Utilizing columnar storage and massively parallel processing (MPP) architectures, Redshift enables high-performance analytical querying while optimizing storage efficiency and data compression. It is compatible with standard SQL and integrates seamlessly with a wide range of AWS services, including Amazon S3 for data loading, Amazon EMR for processing, and AWS Glue for ETL operations. Redshift's scalability allows users to easily adjust cluster size based on workload requirements, while its security features—such as VPC isolation, encryption at rest and in transit, and role-based access control—ensure robust data protection.

Why Move Data from Teradata into Amazon Redshift

The Case for Copying Data from Teradata to Redshift for Advanced Analytics

There are several important reasons why a data engineer or architect might choose to copy data from Teradata into Amazon Redshift. Teradata often stores rich and potentially valuable datasets accumulated over years of enterprise operations. By integrating this data with information from other sources within Redshift, organizations can unlock deeper insights and realize greater analytic value than with siloed data alone. Furthermore, performing data integration and analysis directly on Redshift, rather than on Teradata, prevents additional workloads from burdening the Teradata environment, thereby preserving its performance for critical operational tasks. This approach enables scalable analytics while maintaining the stability of existing systems.

Creating a User in Teradata

This guide provides instructions on how to create a user in a Teradata database. You will require appropriate privileges (typically as a Database Administrator or with CREATE USER privilege) to perform these actions.

Prerequisites

  • Access to Teradata SQL Assistant, BTEQ, or a database client that can execute SQL against your Teradata instance.
  • Sufficient space in the immediate owner database for the new user.
  • Optional: A secured password policy (if relevant for your environment).

Basic Syntax

The basic syntax for creating a user in Teradata is:

CREATE USER user_name
  FROM database_name
  AS
    PERM = perm_space
    [ SPOOL = spool_space ]
    [ TEMPORARY = temp_space ]
    [ PASSWORD = 'password' ]
    [ DEFAULT DATABASE = database_name ]
    [ ACCOUNT = 'account_string' ]
    [ PROFILE = profile_name ]
    [ STARTUP = 'startup_expression' ]
    ;

  • user_name
    : The name of the new user. Must be unique.
  • database_name
    : The parent database or owner (could be DBC or another database).
  • perm_space
    : Amount of permanent space (in bytes).
  • spool_space
    : Optional. Temporary working space (in bytes).
  • temp_space
    : Optional. Temporary table space (in bytes).
  • password
    : Initial password for the user.
  • Other options as required (see Teradata documentation for all possibilities).

Example: Create a User

This example creates a user named

joe_user
within the
dbc
database, allocating 2GB of permanent space, 1GB of spool space, a temporary space of 500MB, and assigns an initial password.

CREATE USER joe_user
  FROM dbc
  AS
    PERM = 2e9           /* 2 GB */
    SPOOL = 1e9          /* 1 GB */
    TEMPORARY = 500000000 /* 500 MB */
    PASSWORD = 'InitPassw0rd!'
    DEFAULT DATABASE = joe_user;

Note: In practice, always enforce strong password policies per your organization’s standards.

Additional Options

  • Account: To assign a specific account string, add
    ACCOUNT = '...'
    in the clause.
  • Profile: To assign a security or resource profile, use
    PROFILE = ...
    .
  • Default Database: Set the user’s default database with
    DEFAULT DATABASE = ...
    .

Granting Roles and Privileges

After creation, you will typically grant necessary privileges or roles:

``` GRANT SELECT, INSERT, UPDATE ON database_name TO joe_user;

-- Or grant a predefined role: GRANT my_role TO joe_user; ```

Replace resource names as appropriate.

Viewing Existing Users

To list users in Teradata, query the DBC.Users or DBC.UsersV system tables:

SELECT UserName, PermSpace, SpoolSpace, AccountName
FROM DBC.UsersV;

Important Notes

  • Only users or databases that have available space can create users.
  • User names should conform to your naming standards and should avoid reserved keywords.
  • Be sure to follow your organization’s governance for user lifecycle management.

Installing the JDBC driver

Matillion Data Productivity Cloud enables connectivity to a wide variety of databases using JDBC drivers. Due to licensing or redistribution restrictions, certain JDBC drivers—such as the Teradata JDBC driver—are not included by default. If you wish to connect to a Teradata database, you must manually download and add the JDBC driver to your Matillion environment.

Downloading the Teradata JDBC Driver

Follow these steps to download the appropriate driver:

  1. Visit the Official Download Page
    Access the official Teradata JDBC driver download page at:
    https://downloads.teradata.com/download/connectivity/jdbc-driver

  2. Select the Correct Version

  3. Look specifically for a Type 4 JDBC driver as it is designed for 100% Java compatibility and is best suited for integration with Matillion.
  4. Ensure you download the driver version that matches your Teradata database version and your corporate Java requirements.

  5. Download the Driver
    The driver package is typically a compressed (.zip) file. Download and extract the contents to a location accessible from your local machine.

Installing the Driver in Matillion Data Productivity Cloud

Once you have obtained and extracted the Teradata JDBC driver, you need to provide it to your Matillion Data Productivity Cloud environment. Follow the steps below:

  1. Prepare the JAR Files
  2. Locate the main JDBC
    .jar
    files extracted from the download package.
  3. Retain all associated library files if the main JDBC driver requires them.

  4. Upload the Driver to Matillion

  5. Access your Matillion Data Productivity Cloud environment.
  6. Navigate to the agent or workspace where you want to enable Teradata connectivity.
  7. Use the documented process for external driver upload, reference:
    Matillion Doc – Uploading External Drivers

This page provides a step-by-step guide for uploading

.jar
files, specifying driver class names, and confirming the driver is available for use.

Using the Teradata JDBC Driver

Once installation is complete, you can begin configuring connections and using the driver within Matillion’s database query components. Refer to the usage instructions for details on setting up connections, specifying connection strings, and using queries:

Be sure to validate your connection and check for any driver-specific configuration steps as outlined in the Teradata and Matillion documentation.

Checking network connectivity

Before connecting Matillion Data Productivity Cloud to your Teradata database, you must ensure that the database allows incoming network connections based on your deployment configuration:

  • Full SaaS Agent Configuration: You must configure your Teradata database to allow incoming connections from the Matillion SaaS agents. The required source IP addresses are listed on the following page: Allowing IP addresses for Matillion Data Productivity Cloud.

  • Hybrid SaaS Deployment: In this configuration, you should allow incoming connections from your own virtual private cloud (VPC). You may use the network connectivity utilities available at Matillion Exchange to help you confirm and diagnose connectivity from your environment.

Additionally, if your Teradata database is referenced by a DNS hostname rather than a direct IP address, you must ensure that the relevant Matillion agent (Full SaaS or Hybrid SaaS) is able to resolve the DNS address to establish a successful connection.

Querying Data from a Teradata Database

This guide provides instructions and best practices for querying data from a Teradata database, specifically for use cases involving extract and replication to environments such as Amazon Redshift. Instructions cover SQL SELECT query examples, handling datatype conversions, and a recommended pattern for initial and incremental data loads. Additional incremental load strategy information is available at Matillion Exchange.


1. Sample Teradata SQL Queries

Basic SELECT Query

SELECT 
    customer_id,
    customer_name,
    city,
    created_date
FROM 
    sales.customers;

SELECT with Filtering

SELECT 
    order_id,
    order_date,
    amount
FROM 
    sales.orders
WHERE 
    order_date >= DATE '2024-01-01';

SELECT with Join and Aggregation

SELECT 
    c.customer_id,
    c.customer_name,
    COUNT(o.order_id) AS total_orders,
    SUM(o.amount) AS total_spent
FROM 
    sales.customers c
INNER JOIN 
    sales.orders o
    ON c.customer_id = o.customer_id
WHERE 
    o.order_date BETWEEN DATE '2024-01-01' AND DATE '2024-06-30'
GROUP BY 
    c.customer_id, c.customer_name;


2. Datatype Conversion: Teradata to Redshift

When transferring data between Teradata and Amazon Redshift, be aware of potential datatype differences. For example:

  • BYTEINT
    (Teradata) →
    SMALLINT
    or
    INTEGER
    (Redshift)
  • NUMBER
    ,
    DECIMAL(p,s)
    (Teradata) →
    NUMERIC(p,s)
    (Redshift)
  • VARCHAR(n)
    (Teradata/Redshift) – Usually compatible, but verify maximum sizes.
  • TIMESTAMP(n)
    (Teradata) →
    TIMESTAMP
    (Redshift)
  • DATE
    (Teradata/Redshift) – Typically compatible.

Always review and test your schema mappings to ensure smooth data movement.


3. Best Load Patterns: Initial and Incremental Loads

Initial Load

The initial load involves extracting the entire dataset from the Teradata source without any filtering. This is generally a one-time process to populate the target (e.g., Redshift) with a full snapshot of the data.

Example (No Filter)

SELECT 
    *
FROM 
    sales.orders;

Configuration Notes: - Use the Database Query component. - Do not specify any

WHERE
clause.

Incremental Load

After the initial load, perform regular incremental loads to transfer only new or changed records. This conserves resources and minimizes processing time.

Example (With Filter)

Add a filter condition, most commonly on a column such as

last_updated
or
created_date
:

SELECT 
    *
FROM 
    sales.orders
WHERE 
    last_updated > '2024-06-20 00:00:00';

Replace the filter value with the timestamp of the last successful load (using a variable or parameter, if supported).

Configuration Notes: - Continue to use the same Database Query component. - Parameterize the filter value when possible for automation.

For more about incremental load patterns, see: Matillion - Incremental Load Data Replication Strategy

Data Integration Architecture

Loading data in advance of integration, as practiced in Extract, Load, Transform (ELT) architectures, offers a “divide and conquer” approach by splitting the process into two sequential steps: first, raw data is loaded into the target Redshift database, and then it’s transformed and integrated as needed. This separation not only streamlines operations but also enables businesses to handle workflows more flexibly. Since data integration inherently requires transformation—such as cleaning, harmonizing, and joining data from different sources—the most effective way to manage these tasks is through robust data transformation pipelines. These pipelines can be orchestrated within Redshift itself, leveraging its scalable resources. A further advantage of the ELT model is that all complex processing and integration occur directly inside Redshift, eliminating the need for separate, costly data processing infrastructure. As a result, transformations are executed quickly and can be initiated on-demand, making the overall integration process both efficient and highly scalable.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.