Extracting data from Altibase to Databricks
Extracting data from Altibase is a common requirement for organizations looking to centralize analytics or modernize their data platforms. As cloud-based solutions like Databricks become increasingly popular, efficiently moving data from Altibase into these environments is essential for deriving actionable insights. This article provides a step-by-step guide to help you extract data from Altibase and load it into Databricks, ensuring a smooth and secure transfer process. We will begin by detailing how to create an appropriate user identity in Altibase, which is necessary to perform queries and access the required datasets. For organizations or users leveraging Matillion, we will explain how to check for an existing JDBC driver or obtain one suitable for connecting to Altibase. Network connectivity between your source (Altibase) and target (Databricks) environments will also be discussed, helping you avoid common pitfalls that can interrupt data pipelines. Finally, we will cover querying options—outlining how to perform both full initial loads and set up incremental data extraction for efficient ongoing transfers. By following this guide, you can ensure a robust ETL process tailored to your technology stack.
What is Altibase?
Altibase is a high-performance, hybrid relational database management system (RDBMS) renowned for its ability to support both in-memory and disk-based data storage within a single unified engine. Designed with enterprise-grade applications in mind, Altibase offers ACID compliance, robust scalability, and advanced SQL compatibility, making it well-suited for environments demanding low-latency transaction processing alongside large-capacity data management. Its architecture enables seamless migration and real-time data movement between in-memory and disk tables, providing flexibility for data-intensive workloads such as telecommunications, finance, and manufacturing. In addition, Altibase includes native replication, clustering, and backup features, ensuring high availability and disaster recovery for mission-critical systems.
What is Databricks?
Databricks is a unified analytics platform optimized for big data and machine learning. It uses Delta Lake, an open-source storage layer that adds ACID transactional guarantees to cloud-based data lakes (e.g., Amazon S3, Azure, Google Cloud). Data is stored in open Parquet format, with Delta Lake providing reliability via transaction logs, schema enforcement, and scalable metadata management. Databricks supports both SQL and Apache Spark workloads, enabling efficient ETL, ad hoc queries, analytics, collaborative notebooks, and integrated machine learning—all within a secure, enterprise-grade environment that ensures strong data governance.
Why Move Data from Altibase into Databricks
Unlocking Advanced Analytics: The Case for Copying Data from Altibase to Databricks
A data engineer or architect may wish to copy data from Altibase into Databricks for several compelling reasons. Firstly, Altibase is likely to hold valuable transactional or operational data that could generate deeper insights if analyzed effectively. By integrating this data with datasets from other systems within Databricks, organizations can uncover patterns, perform advanced analytics, and enable richer business intelligence that would be difficult to achieve using Altibase alone. Importantly, utilizing Databricks for this data integration ensures that the heavy processing workloads, such as data transformations, analytics, and machine learning model training, are carried out on a dedicated analytics platform. This approach prevents any unnecessary strain on the Altibase environment, thereby maintaining its performance for mission-critical operational tasks.
Similar connectors
Creating a User in Altibase Database
To create a new user in an Altibase database, you must have sufficient privileges (typically DBA or SYSDBA). The following steps outline the procedure for creating a user, assigning a password, and granting basic privileges.
1. Connect to the Altibase Database
First, connect to your Altibase instance using (
isql), Altibase's command-line SQL tool, or any compatible client:
bash isql -u sys -p <SYS_PASSWORD> -s <SERVER_IP> -n <PORT>
<SYS_PASSWORD>
: replace with the SYS user's password<SERVER_IP>
: the database server address<PORT>
: Altibase port (default is 20300)
2. Create a User
Use the following SQL command to create a new user. Replace
<username>and
<password>with your desired values.
CREATE USER <username> IDENTIFIED BY <password>;
Example:
CREATE USER alice IDENTIFIED BY SecureP@ssw0rd;
3. Grant Privileges
At minimum, the new user will need the
CONNECTprivilege, and optionally additional roles or privileges as necessary.
GRANT CONNECT TO <username>;
Example:
GRANT CONNECT TO alice;
To allow user creation of tables and other resources:
GRANT RESOURCE TO <username>;
4. (Optional) Grant DBA Privileges
If the user needs database administrator rights:
GRANT DBA TO <username>;
Note: This should only be granted to trusted users.
5. Verify User Creation
You can verify that the user has been created by querying the system tables:
SELECT USERNAME FROM SYSTEM_.USERS WHERE USERNAME = '<username>';
Example:
SELECT USERNAME FROM SYSTEM_.USERS WHERE USERNAME = 'ALICE';
Example: Full Script
CREATE USER alice IDENTIFIED BY SecureP@ssw0rd; GRANT CONNECT, RESOURCE TO alice;
Replace usernames and passwords as required for your security policies.
Installing the JDBC Driver
At the time of writing, the JDBC driver for Altibase is not included by default in Matillion Data Productivity Cloud, due to licensing or redistribution restrictions. You must manually download and install the appropriate driver in order to connect to an Altibase database.
1. Download the Altibase JDBC Driver - Visit https://www.altibase.com/product/. - Search for and download the Altibase JDBC driver. Preferably, obtain the Type 4 JDBC driver, as this is a pure Java driver and generally the most suitable choice for integration with cloud systems like Matillion.
2. Review Compatibility - Ensure the version of the JDBC driver matches both your Altibase database version and the requirements of your Matillion Data Productivity Cloud deployment.
3. Upload the Driver to Matillion Data Productivity Cloud - Matillion requires manual uploading of external JDBC drivers in some scenarios. Instructions for this process can be found here: https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/. - Follow the documented process, which typically involves accessing the user interface or your agent installation, uploading the downloaded JDBC driver JAR file, and ensuring the platform recognises the new driver.
4. Configure and Use the Driver - After uploading, configure your connection settings in Matillion to use the new JDBC driver for Altibase. - For detailed usage and configuration instructions, please refer to the usage documentation: https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/. - Ensure that authentication, endpoints, and any required parameters specific to Altibase are correctly specified.
By following these steps and referencing the linked documentation, you can successfully enable Altibase connectivity in your Matillion environment.
Checking network connectivity
To ensure successful connectivity between Matillion Data Productivity Cloud and your Altibase database, you must verify that the database allows incoming network connections according to your deployment configuration:
-
Full SaaS agent configuration:
The Altibase database must be configured to accept incoming connections from the IP addresses used by Matillion Data Productivity Cloud's SaaS agents. The current list of required IP addresses can be found at:
https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/
Ensure these IP addresses are added to your database or network firewall allowlist. -
Hybrid SaaS deployment:
If you are using a Hybrid SaaS deployment, your Altibase database must allow incoming connections from your own virtual private cloud (VPC). You can use the network access check utilities provided at:
https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/
to help validate proper connectivity from your VPC environment.
Additionally, if you are connecting to the Altibase database using a DNS name rather than a fixed IP address, make sure that your Full SaaS or Hybrid SaaS agent is able to resolve the DNS name successfully to the correct database endpoint. Proper DNS resolution is required for establishing network connectivity.
Querying Data from an Altibase Database
Running SQL Queries with Altibase
Altibase uses standard SQL for querying data. Below are examples of typical SELECT statements:
1. Basic SELECT Statement
SELECT * FROM customers;
2. Filtering Data
SELECT id, name FROM users WHERE status = 'active';
3. Ordering Results
SELECT order_id, total FROM orders ORDER BY total DESC;
4. Aggregation
SELECT department, COUNT(*) AS total_employees FROM employees GROUP BY department;
Considerations for Datatype Conversion
When migrating or integrating data between Altibase and Databricks, be aware that automatic datatype conversion may occur. For example: - TEXT or VARCHAR in Altibase may map to STRING in Databricks. - NUMBER or INTEGER types may map to INT or BIGINT. - DateTime fields may require conversion to TIMESTAMP.
Always review and validate field types when designing your ETL process.
Patterns for Data Loading
A common pattern when loading data from Altibase is to use two phases: 1. Initial (Full) Load 2. Incremental Load
This approach ensures efficiency by syncing the entire dataset once and then updating only with new or changed data.
Both load types typically use the same "Database Query" component in your ETL pipeline.
Initial (Full) Load Example
For a full load, you query the entire table, with no filter clause:
SELECT * FROM transactions;
Incremental Load Example
For incremental loads, use a filter—typically on a column such as a timestamp or an incrementing ID:
SELECT * FROM transactions WHERE modified_time > '2024-06-01 00:00:00';
Replace
modified_timeand the timestamp as appropriate for your schema and load window.
For further details on incremental load strategies, see Incremental Load - Data Replication Strategy.
Data Integration Architecture
Loading data in advance of integration offers a practical "divide and conquer" approach by separating the integration process into two manageable steps: first, data is loaded into the target environment, and second, it is integrated and transformed as needed. This separation is a key advantage of the Extract, Load, and Transform (ELT) architecture. Effective data integration invariably requires reliable data transformation routines, and the most robust method for accomplishing this is through purpose-built data transformation pipelines. In the ELT paradigm, these transformation and integration processes are performed directly within the target environment—in this case, the Databricks database. This offers substantial benefits: data transformation and integration become fast, scalable, and available on-demand, without the need for additional investment in dedicated data processing infrastructure.