Integrate data from Altibase to Databricks using Matillion

The Altibase to Databricks connector streamlines data transfer to Databricks in just minutes, ensuring your data remains current without requiring manual coding or intricate ETL scripts.

Altibase
Databricks
Altibase to Databricks banner

Extracting data from Altibase to Databricks

Extracting data from Altibase is a common requirement for organizations looking to centralize analytics or modernize their data platforms. As cloud-based solutions like Databricks become increasingly popular, efficiently moving data from Altibase into these environments is essential for deriving actionable insights. This article provides a step-by-step guide to help you extract data from Altibase and load it into Databricks, ensuring a smooth and secure transfer process. We will begin by detailing how to create an appropriate user identity in Altibase, which is necessary to perform queries and access the required datasets. For organizations or users leveraging Matillion, we will explain how to check for an existing JDBC driver or obtain one suitable for connecting to Altibase. Network connectivity between your source (Altibase) and target (Databricks) environments will also be discussed, helping you avoid common pitfalls that can interrupt data pipelines. Finally, we will cover querying options—outlining how to perform both full initial loads and set up incremental data extraction for efficient ongoing transfers. By following this guide, you can ensure a robust ETL process tailored to your technology stack.


What is Altibase?

Altibase is a high-performance, hybrid relational database management system (RDBMS) renowned for its ability to support both in-memory and disk-based data storage within a single unified engine. Designed with enterprise-grade applications in mind, Altibase offers ACID compliance, robust scalability, and advanced SQL compatibility, making it well-suited for environments demanding low-latency transaction processing alongside large-capacity data management. Its architecture enables seamless migration and real-time data movement between in-memory and disk tables, providing flexibility for data-intensive workloads such as telecommunications, finance, and manufacturing. In addition, Altibase includes native replication, clustering, and backup features, ensuring high availability and disaster recovery for mission-critical systems.

matillion logo x Altibase

What is Databricks?

Databricks is a unified analytics platform optimized for big data and machine learning. It uses Delta Lake, an open-source storage layer that adds ACID transactional guarantees to cloud-based data lakes (e.g., Amazon S3, Azure, Google Cloud). Data is stored in open Parquet format, with Delta Lake providing reliability via transaction logs, schema enforcement, and scalable metadata management. Databricks supports both SQL and Apache Spark workloads, enabling efficient ETL, ad hoc queries, analytics, collaborative notebooks, and integrated machine learning—all within a secure, enterprise-grade environment that ensures strong data governance.

Why Move Data from Altibase into Databricks

Unlocking Advanced Analytics: The Case for Copying Data from Altibase to Databricks

A data engineer or architect may wish to copy data from Altibase into Databricks for several compelling reasons. Firstly, Altibase is likely to hold valuable transactional or operational data that could generate deeper insights if analyzed effectively. By integrating this data with datasets from other systems within Databricks, organizations can uncover patterns, perform advanced analytics, and enable richer business intelligence that would be difficult to achieve using Altibase alone. Importantly, utilizing Databricks for this data integration ensures that the heavy processing workloads, such as data transformations, analytics, and machine learning model training, are carried out on a dedicated analytics platform. This approach prevents any unnecessary strain on the Altibase environment, thereby maintaining its performance for mission-critical operational tasks.

Creating a User in Altibase Database

To create a new user in an Altibase database, you must have sufficient privileges (typically DBA or SYSDBA). The following steps outline the procedure for creating a user, assigning a password, and granting basic privileges.

1. Connect to the Altibase Database

First, connect to your Altibase instance using (

isql
), Altibase's command-line SQL tool, or any compatible client:

bash
isql -u sys -p <SYS_PASSWORD> -s <SERVER_IP> -n <PORT>

  • <SYS_PASSWORD>
    : replace with the SYS user's password
  • <SERVER_IP>
    : the database server address
  • <PORT>
    : Altibase port (default is 20300)

2. Create a User

Use the following SQL command to create a new user. Replace

<username>
and
<password>
with your desired values.

CREATE USER <username> IDENTIFIED BY <password>;

Example:

CREATE USER alice IDENTIFIED BY SecureP@ssw0rd;

3. Grant Privileges

At minimum, the new user will need the

CONNECT
privilege, and optionally additional roles or privileges as necessary.

GRANT CONNECT TO <username>;

Example:

GRANT CONNECT TO alice;

To allow user creation of tables and other resources:

GRANT RESOURCE TO <username>;

4. (Optional) Grant DBA Privileges

If the user needs database administrator rights:

GRANT DBA TO <username>;

Note: This should only be granted to trusted users.

5. Verify User Creation

You can verify that the user has been created by querying the system tables:

SELECT USERNAME FROM SYSTEM_.USERS WHERE USERNAME = '<username>';

Example:

SELECT USERNAME FROM SYSTEM_.USERS WHERE USERNAME = 'ALICE';

Example: Full Script

CREATE USER alice IDENTIFIED BY SecureP@ssw0rd;
GRANT CONNECT, RESOURCE TO alice;

Replace usernames and passwords as required for your security policies.

Installing the JDBC Driver

At the time of writing, the JDBC driver for Altibase is not included by default in Matillion Data Productivity Cloud, due to licensing or redistribution restrictions. You must manually download and install the appropriate driver in order to connect to an Altibase database.

1. Download the Altibase JDBC Driver - Visit https://www.altibase.com/product/. - Search for and download the Altibase JDBC driver. Preferably, obtain the Type 4 JDBC driver, as this is a pure Java driver and generally the most suitable choice for integration with cloud systems like Matillion.

2. Review Compatibility - Ensure the version of the JDBC driver matches both your Altibase database version and the requirements of your Matillion Data Productivity Cloud deployment.

3. Upload the Driver to Matillion Data Productivity Cloud - Matillion requires manual uploading of external JDBC drivers in some scenarios. Instructions for this process can be found here: https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/. - Follow the documented process, which typically involves accessing the user interface or your agent installation, uploading the downloaded JDBC driver JAR file, and ensuring the platform recognises the new driver.

4. Configure and Use the Driver - After uploading, configure your connection settings in Matillion to use the new JDBC driver for Altibase. - For detailed usage and configuration instructions, please refer to the usage documentation: https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/. - Ensure that authentication, endpoints, and any required parameters specific to Altibase are correctly specified.

By following these steps and referencing the linked documentation, you can successfully enable Altibase connectivity in your Matillion environment.

Checking network connectivity

To ensure successful connectivity between Matillion Data Productivity Cloud and your Altibase database, you must verify that the database allows incoming network connections according to your deployment configuration:

Additionally, if you are connecting to the Altibase database using a DNS name rather than a fixed IP address, make sure that your Full SaaS or Hybrid SaaS agent is able to resolve the DNS name successfully to the correct database endpoint. Proper DNS resolution is required for establishing network connectivity.

Querying Data from an Altibase Database

Running SQL Queries with Altibase

Altibase uses standard SQL for querying data. Below are examples of typical SELECT statements:

1. Basic SELECT Statement

SELECT * FROM customers;

2. Filtering Data

SELECT id, name FROM users WHERE status = 'active';

3. Ordering Results

SELECT order_id, total FROM orders ORDER BY total DESC;

4. Aggregation

SELECT department, COUNT(*) AS total_employees
FROM employees
GROUP BY department;

Considerations for Datatype Conversion

When migrating or integrating data between Altibase and Databricks, be aware that automatic datatype conversion may occur. For example: - TEXT or VARCHAR in Altibase may map to STRING in Databricks. - NUMBER or INTEGER types may map to INT or BIGINT. - DateTime fields may require conversion to TIMESTAMP.

Always review and validate field types when designing your ETL process.

Patterns for Data Loading

A common pattern when loading data from Altibase is to use two phases: 1. Initial (Full) Load 2. Incremental Load

This approach ensures efficiency by syncing the entire dataset once and then updating only with new or changed data.

Both load types typically use the same "Database Query" component in your ETL pipeline.

Initial (Full) Load Example

For a full load, you query the entire table, with no filter clause:

SELECT * FROM transactions;

Incremental Load Example

For incremental loads, use a filter—typically on a column such as a timestamp or an incrementing ID:

SELECT * FROM transactions WHERE modified_time > '2024-06-01 00:00:00';

Replace

modified_time
and the timestamp as appropriate for your schema and load window.

For further details on incremental load strategies, see Incremental Load - Data Replication Strategy.

Data Integration Architecture

Loading data in advance of integration offers a practical "divide and conquer" approach by separating the integration process into two manageable steps: first, data is loaded into the target environment, and second, it is integrated and transformed as needed. This separation is a key advantage of the Extract, Load, and Transform (ELT) architecture. Effective data integration invariably requires reliable data transformation routines, and the most robust method for accomplishing this is through purpose-built data transformation pipelines. In the ELT paradigm, these transformation and integration processes are performed directly within the target environment—in this case, the Databricks database. This offers substantial benefits: data transformation and integration become fast, scalable, and available on-demand, without the need for additional investment in dedicated data processing infrastructure.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.