Integrate data from Altibase to Amazon Redshift using Matillion

Our Altibase to Redshift connector seamlessly migrates your data to Redshift within minutes, ensuring it remains current without the need for manual coding or intricate ETL processes.

Altibase
Amazon Redshift
Altibase to Amazon Redshift banner

Extracting data from Altibase to Amazon Redshift

Extracting data from Altibase is a crucial step for organizations aiming to integrate legacy or operational data with modern analytics platforms such as Amazon Redshift. This process enables teams to leverage flexible, scalable cloud data warehouses for enhanced reporting and analytics. In this article, we will guide you through the end-to-end journey of data extraction and loading—from securing access to your Altibase instance to incrementally syncing your data within Redshift environments. Specifically, we will cover the following topics: - **Creating an identity in Altibase:** Setting up and managing the necessary user credentials to enable secure data extraction. - **For Matillion users, checking or acquiring the JDBC driver:** Ensuring that Matillion ETL is equipped with the appropriate Altibase JDBC driver to facilitate seamless connectivity. - **Network connectivity from source to target:** Verifying and configuring network paths between your Altibase and Redshift instances to allow reliable data transfer. - **Querying data, initially and incrementally:** Crafting effective queries for both full initial loads and efficient incremental updates. By following these steps, you will be equipped to streamline data migration from Altibase to Redshift, supporting both immediate and ongoing analytics needs.


What is Altibase?

Altibase is a high-performance, enterprise-grade RDBMS featuring a unique hybrid architecture that unifies in-memory and disk-based storage in one engine. Developed in South Korea, it is ACID-compliant, uses standard SQL, and supports both transactional and analytical workloads. Its design enables rapid, memory-based operations alongside disk storage for larger, less time-sensitive data—all under one relational schema. Altibase delivers scalable OLTP performance, high availability, and is deployed in sectors like telecommunications, finance, and manufacturing requiring real-time processing. Available as open source, Altibase gives organizations broad, cost-effective access to its advanced database capabilities.

matillion logo x Altibase

What is Amazon Redshift?

Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse for fast, efficient analytics and reporting. It uses columnar storage and parallel queries for high performance on large datasets and integrates with AWS. Supporting standard SQL, Redshift suits both traditional and cloud-native users. It features decoupled storage and compute for elastic scaling and cost savings. With Redshift Spectrum, users can query exabytes of data in Amazon S3 without moving data. Robust security, automated backups, and modern data sharing make Redshift a strong choice for enterprises and data engineers.

Why Move Data from Altibase into Amazon Redshift

Unlocking Advanced Analytics: Copying Data from Altibase to Amazon Redshift

A data engineer or architect may wish to copy data from Altibase into Amazon Redshift for several compelling reasons. Firstly, Altibase may contain transactional or historical data that is potentially valuable for analytics, reporting, or business intelligence initiatives. However, the true value of this data is often realized when it is integrated with information from other sources, enabling more comprehensive insights and decision-making. By using Redshift—a scalable, cloud-based data warehouse—to perform the data integration and analytical processing, organizations can avoid placing additional queries or workload directly on the operational Altibase system. This approach not only preserves the performance of Altibase for mission-critical applications but also leverages the advanced analytical capabilities and scalability of Redshift, ultimately maximizing the utility of the data.

Creating a User in Altibase Database

To create a new user (identity) in an Altibase database, you will use the

CREATE USER
SQL statement. You must have sufficient privileges (typically as the
SYS
user or another user with DBA rights) to perform this action.

Steps

  1. Connect to Altibase Database

Establish a connection to your Altibase database using the ALTIBASE client (ALTISQL) or any other SQL tool that supports Altibase.

ell
   altisql -h <HOST> -u <ADMIN_USER> -p <PASSWORD> -s <SERVICE_NAME>

Replace

<HOST>
,
<ADMIN_USER>
,
<PASSWORD>
, and
<SERVICE_NAME>
as appropriate for your environment.

  1. Create a New User

Use the following SQL syntax to create a user:

CREATE USER username IDENTIFIED BY "password";

  • username
    : The desired username for the new user.
  • "password"
    : The password, enclosed in double quotes. Altibase passwords are case-sensitive and must be between 4 and 30 bytes.

Example:

CREATE USER app_user IDENTIFIED BY "StrongP@ssw0rd";

  1. Grant Privileges to the New User (Optional)

After creating the user, you may want to grant necessary privileges so the user can perform specific tasks.

Example:

GRANT CONNECT TO app_user;
   GRANT RESOURCE TO app_user;
   -- Grant additional object or system privileges as needed

  1. Verify User Creation (Optional)

You can verify that the user was created successfully by querying the system view:

SELECT USERNAME FROM SYS.USERS WHERE USERNAME = 'APP_USER';

Notes

  • Usernames in Altibase are not case-sensitive, but passwords are case-sensitive.
  • Ensure adherence to your organization's security policies when assigning passwords and privileges.

Installing the JDBC driver

To enable connectivity between Matillion Data Productivity Cloud and Altibase databases, it is necessary to manually acquire and install the Altibase JDBC driver. Currently, this JDBC driver is not included in Matillion Data Productivity Cloud by default, due to licensing and redistribution restrictions. The following steps will guide you through obtaining and installing the required driver.

1. Downloading the Altibase JDBC Driver

  1. Navigate to the official download page for Altibase at https://www.altibase.com/product/.
  2. On the Altibase website, locate the section for database client downloads or JDBC drivers.
  3. When presented with multiple options, select a Type 4 JDBC driver. The Type 4 driver (also known as a "thin" or "pure Java" driver) is preferred for Matillion Data Productivity Cloud as it does not require any native library installations.
  4. Download the appropriate distribution of the driver file (typically provided as a single
    .jar
    file) and ensure it is saved in a location accessible from your workstation or wherever you manage your Matillion agent deployments.

2. Installing the JDBC Driver in Matillion Data Productivity Cloud

With the Altibase JDBC driver downloaded, you must next install it into your Matillion environment. Detailed installation instructions are maintained in Matillion's documentation: Uploading external drivers.

The high-level steps are:

  • Access your Matillion Agent instance where you want to use the Altibase connection.
  • Follow the documented steps to upload a new external JDBC driver, specifying the downloaded
    .jar
    file.
  • Restart or refresh the agent as recommended in the documentation to ensure the new driver is loaded.

3. Using the Installed Driver

Once the driver is installed and loaded by your Matillion Agent, you can begin configuring database queries that interact with Altibase. For details on how to set up and use database queries in Matillion Data Productivity Cloud, refer to: Database Query usage documentation.

Be sure to select the Altibase driver when building your connection, and provide any further connection details as required by your Altibase database environment.

Checking network connectivity

To ensure successful connectivity between the Matillion Data Productivity Cloud and your Altibase database, you must configure your database to accept incoming connections according to your specific deployment type:

  • Full SaaS agent configuration:
    You need to allow incoming connections to your Altibase database from the static IP addresses used by Matillion Cloud agents. The full list of these IP addresses is available at:
    Matillion Allowed IP Addresses

  • Hybrid SaaS deployment:
    In this configuration, database connectivity will originate from your own environment, specifically your virtual private cloud (VPC). Ensure that your Altibase database allows inbound connections from your VPC's IP address range. To help verify network connectivity in a Hybrid SaaS deployment, you can use the utilities provided at:
    Matillion Data Productivity Cloud Network Check Utilities

Additionally, if your Altibase database is referenced by a DNS name rather than an IP address, you must ensure that the Full SaaS or Hybrid SaaS agent is able to resolve that DNS address to connect successfully. This may include verifying DNS records and making sure any required DNS servers or services are accessible from the Matillion environment.

Querying Data from an Altibase Database

This guide covers how to query data from an Altibase database, with annotated SQL examples. It also highlights datatype conversion considerations between Altibase and Amazon Redshift, and recommends best practices for data extraction workflows.


Example: SQL SELECT Statements for Altibase

To fetch data from Altibase, you use familiar SQL syntax:

1. Select All Records

SELECT * FROM EMPLOYEE;

2. Select Specific Columns

SELECT EMP_ID, NAME, DEPT FROM EMPLOYEE;

3. Filtered Selection

SELECT EMP_ID, SALARY FROM EMPLOYEE WHERE DEPT = 'HR';

4. Aggregation and Grouping

SELECT DEPT, AVG(SALARY) AS AVG_SALARY FROM EMPLOYEE GROUP BY DEPT;


Datatype Conversion Considerations

When migrating or replicating data from Altibase to Redshift, automatic datatype conversion might occur. Typical differences include:

  • Numeric Types:
    NUMBER
    in Altibase may be mapped to
    DECIMAL
    in Redshift.
  • String Types:
    VARCHAR
    , although supported by both, might have different max lengths.
  • Date/Time:
    DATE
    ,
    TIMESTAMP
    formats and handling may vary. Always confirm their equivalence between the systems.
  • Boolean: Altibase does not have a dedicated BOOLEAN type; often implemented as
    CHAR('Y'/'N')
    or
    NUMBER(1)
    . Redshift has a native BOOLEAN type.

Test conversions to ensure data integrity.


Data Extraction Workflow: Initial and Incremental Loads

For data movement (e.g., using an ETL platform like Matillion), a two-phase pattern is recommended:

1. Initial (Full) Load

  • Purpose: Extract the entire dataset.
  • Query Pattern: No filter clause.

-- Example: Full table load (no WHERE clause)
SELECT * FROM EMPLOYEE;

2. Incremental Load

  • Purpose: Extract only new or modified records since the last update.
  • Query Pattern: Use a WHERE clause with a filter on a timestamp column or incrementing key.

-- Example: Incremental load using 'updated_at' column
SELECT * FROM EMPLOYEE WHERE UPDATED_AT > '2024-06-01 00:00:00';

Note: Use the same Database Query component for both types of loads. For the incremental pattern, parameterize the filter for maintainability.


For more incremental loading techniques and strategies, see: Matillion Exchange: Incremental Load Data Replication Strategy

Data Integration Architecture

Loading data into Amazon Redshift in advance of integration is an effective way to divide and conquer the complexities of data workflow, as it splits the process into two manageable steps: loading (extract and load) and integration (transform). This approach exemplifies the advantages of ELT (Extract, Load, Transform) architecture, where raw data is first ingested into the database and subsequent integration tasks—such as data cleansing, joining, or aggregations—are handled using transformation pipelines within Redshift. Employing data transformation pipelines is considered best practice because they allow complex transformations to be performed efficiently and programmatically. Moreover, a significant benefit of the ELT model is that all data transformations and integrations are executed directly inside the target Redshift environment. This design ensures that data processing is both fast and scalable since Redshift is optimized for high-performance workloads, and it eliminates the need to provision or maintain separate data processing infrastructure, thereby reducing operational costs.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.