Extracting data from Adabas (Software AG) to Databricks
Extracting data from Adabas is a crucial step for organizations aiming to modernize their data infrastructure and leverage advanced analytics capabilities. Adabas, a high-performance database system, often contains valuable operational data that businesses wish to analyze using powerful platforms such as Databricks. Successfully migrating or integrating this data enables richer insights and more scalable data operations. This article provides a practical guide to moving data from Adabas to Databricks. We will start by guiding you through the process of creating an identity in Adabas, a necessary prerequisite for secure data access. If you are using Matillion to orchestrate your data workflows, we will explain how to verify or acquire the required JDBC driver to connect to Adabas. Ensuring robust network connectivity between your Adabas source and Databricks target is another central consideration, and we will outline key steps for configuring reliable communication. Finally, we will cover methods for querying data from Adabas, detailing both the initial data pull and strategies for ongoing incremental extractions. Whether you are developing a proof of concept or planning a production migration, this article will equip you with the essential knowledge required for a successful Adabas-to-Databricks data integration.
What is Adabas (Software AG)?
Adabas (Adaptable Database System) is a high-performance, non-relational database from Software AG, designed for large-scale transaction processing. Using an inverted list architecture, Adabas enables fast storage and retrieval of complex, hierarchical data. It natively integrates with the Natural programming language and supports multiple programming interfaces across mainframes and open systems. With robust data security, backup, and recovery features, Adabas is favored in industries such as banking, government, and utilities that require high availability, scalability, integrity, and reliability in their enterprise applications. Its efficiency and platform flexibility make Adabas ideal for mission-critical environments demanding top performance.
What is Databricks?
Databricks is a unified analytics platform based on Apache Spark, offering a scalable, collaborative space for data engineering, science, and machine learning. Its 'database' is a logical namespace within the Lakehouse, organizing tables and artifacts while abstracting data storage complexities. Integrating with cloud object stores (AWS S3, Azure, or Google Cloud) and Delta Lake, Databricks provides ACID transactions, schema enforcement, and time travel for reliable, high-performance data access. This architecture supports operational analytics and allows teams to manage structured or semi-structured data with familiar SQL in a governed, collaborative workspace.
Why Move Data from Adabas (Software AG) into Databricks
Unlocking Analytics: Copying Data from Adabas to Databricks for Powerful Insights
A data engineer or architect might choose to copy data from Adabas into Databricks for several compelling reasons. Firstly, Adabas often contains data of significant business value, but accessing and analyzing this information is limited by the constraints of the legacy database environment. By integrating Adabas data with information from other modern sources in a unified analytics platform like Databricks, organizations can enhance their ability to derive actionable insights, identify trends, and drive data-driven decision making. Moreover, performing data integration and analysis on Databricks rather than directly on Adabas ensures that complex queries and computationally intensive workloads do not impact the performance or stability of the operational Adabas system. This approach enables organizations to maximize the value of their legacy data assets while preserving the integrity and responsiveness of critical transactional databases.
Similar connectors
Creating a User in Adabas
Adabas (Adaptable Database System) by Software AG is a high-performance, multi-platform database management system. Unlike many relational database systems, Adabas does not use SQL for administrative tasks such as user creation. Instead, user management is handled externally (e.g., via security products like Natural Security or operating system mechanisms), as Adabas itself does not natively maintain a user catalog.
The most common approach to implementing user security in Adabas is through the integration of Natural Security. Below are step-by-step instructions for creating a new user with Natural Security:
Prerequisites
- Adabas and Natural Security must be installed and accessible.
- You need Administrator or Security Administrator privileges in Natural Security.
Steps to Create a User in Natural Security for Adabas
- Access Natural Security:
- Start the Natural environment.
-
At the Main Menu, enter
NSC
and press Enter to access Natural Security. -
Navigate to User Maintenance:
-
In the Natural Security Main Menu, select User Maintenance (option
3
). -
Add a New User:
- On the User Maintenance menu, press
A
(Add) to define a new user. -
Fill in the required fields:
- User ID: (Unique user identifier, alphanumeric)
- User name: (Descriptive user name)
- Password: (Optional at creation; prompt on first login)
- Other - Define additional security attributes, department, libraries, etc., as needed.
-
Define User Profile:
- Set the user type (e.g., Person, Group, Admin).
-
Assign access rights and scope of authorization for different Adabas files or applications.
-
Save the User Definition:
-
After entering all necessary details, use the
SAVE
command (PF5), or follow onscreen instructions to store the new user definition. -
Distribute Credentials:
- Provide the newly created user with their User ID and initial password.
Notes
- Adabas Itself Has No SQL: You cannot create users within Adabas using SQL commands.
- External Authentication: If Natural Security is not used, user authentication and authorization may rely on operating system-level controls or other security software.
Example: Adding a User (Natural Security Menu Flow)
text
Main Menu
-> NSC (Natural Security)
-> 3 (User Maintenance)
-> A (Add)
- User ID....: JOHNDOE
- User name..: John Doe
- Set other fields as required (password, department, etc.)
-> PF5 (Save)
For advanced automation, consult the Software AG documentation on batch administration using
NSSPUTILor security exits.
References: - Software AG - Natural Security Documentation - Adabas and Natural User Groups
Installing the JDBC driver
The Adabas JDBC driver is required for connecting Matillion Data Productivity Cloud to your Adabas database. Unlike some database drivers, the Adabas JDBC driver is not distributed with Matillion Data Productivity Cloud due to licensing and redistribution restrictions. To proceed, you must manually download and install the driver.
Downloading the Adabas JDBC Driver
- Go to the official site or provider and download the Adabas Type 4 JDBC driver.
- NOTE: At the time of writing, you must supply your own copy of the JDBC driver, as it cannot be bundled with Matillion due to license requirements.
-
Where possible, look for a Type 4 JDBC driver. Type 4 drivers are preferred as they are pure Java drivers and provide direct connection to the database without requiring any platform-specific native code.
-
The download location for the official Adabas JDBC driver is: -
(Insert your specific vendor download link above.) -
Ensure you comply with any licensing requirements during download and before further distribution or use.
Installing the JDBC Driver in Matillion Data Productivity Cloud
- After downloading the driver, consult the official Matillion documentation for instructions on how to upload and configure external JDBC drivers.
-
Refer to: Matillion – Uploading External Drivers
-
Typically, you will:
- Access the Matillion Agent or JDBC driver administration interface.
- Upload your downloaded Adabas JDBC driver (usually a
.jar
file). -
Deploy/activate the driver according to your workspace or agent configuration.
-
Once installed, the driver will be available as a selectable option for database connectors and query tools within the Matillion platform.
Using the JDBC Driver in Matillion
After the JDBC driver has been successfully installed, refer to the official usage guide for establishing connections and running queries against your Adabas database within Matillion Data Productivity Cloud.
Follow these instructions to set up your database connections, configure connection parameters, and securely authenticate as required by your Adabas installation.
By following these steps, your Matillion environment will be equipped to communicate with Adabas using the preferred JDBC interface.
Checking network connectivity
To ensure successful connectivity between the Matillion Data Productivity Cloud and your Adabas database, you must verify that the database allows incoming network connections from the appropriate sources, depending on your deployment configuration:
-
Full SaaS Agent Configuration:
Configure your Adabas database network security settings to allow incoming connections from the Matillion-provided IP addresses. You can find the current list of required IP addresses here: Matillion Cloud Agent IP Address Documentation. -
Hybrid SaaS Deployment:
In this scenario, Matillion agents operate from within your own virtual private cloud (VPC). You must ensure that your Adabas database permits incoming connections from the VPC where your Matillion agent is deployed. For assistance verifying connectivity and identifying required outbound addresses, you can use the tools available here: Matillion Network Access Checker.
Additionally, if your Adabas database is referenced using a DNS hostname rather than an IP address, the Full SaaS Agent or the Hybrid SaaS Agent must be able to resolve the DNS address to connect successfully. Ensure that appropriate DNS resolution is possible from the relevant networking environment agent.
Querying Data from an Adabas Database
This guide outlines how to query data from an Adabas database, with examples using SQL SELECT statements. It also highlights integration patterns, datatype conversion considerations with platforms like Databricks, and best practices for data loading.
1. Example Adabas Queries using SQL SELECT
Although Adabas natively uses a hierarchical data model and its own query mechanisms (such as Natural, Adabas SQL Gateway, or third-party adapters), data is often accessed via SQL when integrating with modern data platforms.
Example: Full Table Extract (Initial Load)
SELECT
CUSTOMER_ID,
FIRST_NAME,
LAST_NAME,
EMAIL_ADDRESS,
CREATED_DT
FROM
CUSTOMER
Example: Incremental Extract
SELECT
CUSTOMER_ID,
FIRST_NAME,
LAST_NAME,
EMAIL_ADDRESS,
CREATED_DT
FROM
CUSTOMER
WHERE
MODIFIED_DT > '2024-01-01 00:00:00'
Note: Adjust the field and filter expression based on your schema and incremental extraction strategy.
2. Datatype Conversion: Adabas to Databricks
When moving data to Databricks (or any platform using SQL-based analytics):
- Datatype Mappings:
- Adabas Alphanumeric → VARCHAR (or STRING)
- Adabas Numeric (Unpacked/packed) → INTEGER or DECIMAL
- Adabas Date Fields → TIMESTAMP or DATE (Ensure correct format conversion!)
- Carefully handle field constraints, length restrictions, and encoding (Adabas may use EBCDIC; Databricks expects UTF-8 by default).
If you are using an ETL tool (e.g., Matillion) to bridge Adabas and Databricks, refer to the tool’s official datatype mapping documentation for precise rules.
3. Best Practices: Initial and Incremental Data Loads
A typical pattern for migrating or integrating Adabas data consists of:
- Initial Load: Perform a one-time, full-table extract (no filter clause) to capture all current data.
-- Example: No filter used
SELECT * FROM ORDERS
- Incremental Loads: Capture only new or changed rows by applying a filter (e.g., a date or sequential key).
-- Example: Filter on last-modified date field
SELECT * FROM ORDERS WHERE LAST_MODIFIED > '2024-06-01 00:00:00'
- Database Query Component: Use the same extract/query command for both full and incremental loads, only varying the filter condition relevant to your incremental extraction logic.
Read more about these patterns in the Incremental Load Data Replication Strategy documentation.
Tip: Always validate extracted data for datatype integrity and completeness, especially when moving between mainframe-originated Adabas datasets and SQL-based analytics platforms.
Data Integration Architecture
Loading data in advance of integration is a proven strategy for simplifying complex data workflows, as it allows organizations to tackle the integration process in two distinct steps. This approach is a key advantage of the ELT (Extract, Load, Transform) architecture: first, raw data is loaded into the Databricks environment, and only then are transformations applied to prepare the data for integration and analysis. Data integration inherently requires transformation tasks—such as normalization, aggregation, or joining disparate datasets—and the most effective way to accomplish these is through scalable data transformation pipelines. Another important benefit of the ELT architecture is that all data transformation and integration activities take place within the target Databricks database itself. As a result, these processes become fast, on-demand, and highly scalable, while eliminating the need for, and cost of, maintaining separate data processing infrastructure.