Extracting data from Sybase Adaptive Server Enterprise (SAP ASE) to Databricks
Extracting data from Sybase Adaptive Server Enterprise (ASE) is a crucial step in many data integration and modernization projects. As organizations increasingly leverage data analytics platforms such as Databricks, ensuring a robust and efficient data pipeline between Sybase ASE and Databricks becomes essential. This article provides a practical guide to moving your data from Sybase ASE into Databricks, addressing each of the main stages involved. We will begin by walking you through the creation of an appropriate database identity in Sybase ASE, ensuring that you have the suitable permissions and access control in place. For users leveraging Matillion as their ETL tool, we will review how to check for or acquire the correct JDBC driver, which is necessary for communicating between systems. The article will also outline key considerations for establishing and maintaining reliable network connectivity between the source and target environments. Finally, we will explore methods for both initial bulk querying and subsequent incremental loading of data, helping you to maximize both the efficiency and reliability of your data transfers. By following this guide, you can confidently extract, transform, and load data from Sybase ASE into Databricks and begin unlocking greater value from your enterprise data assets.
What is Sybase Adaptive Server Enterprise (SAP ASE)?
Sybase Adaptive Server Enterprise (ASE) is a high-performance RDBMS originally by Sybase, now maintained by SAP. Designed for mission-critical, data-heavy applications, it excels in online transaction processing (OLTP) with strong scalability, reliability, and advanced concurrency controls. ASE supports full SQL standards, stored procedures, triggers, and robust security features for enterprise use. Its architecture enables efficient memory use, fast transaction processing, data compression, and multiplexing. The system integrates with various development tools and application servers, offering native replication, clustering, and disaster recovery for high availability and data integrity—making it ideal for organizations requiring consistent uptime.
What is Databricks?
Databricks is a powerful data management platform built on Apache Spark, featuring collaborative analytics, scalable processing, and machine learning workflows. Its database uses Delta Lake, an open-source storage layer providing ACID transactions, scalable metadata, and data versioning for large-scale data lakes. Users manage varied data with SQL or APIs in languages like Python, Scala, and R. Integration with cloud storage (e.g., AWS S3, Azure Data Lake Storage) ensures elastic scaling and data sharing. Features such as time travel, schema enforcement, and change data capture boost reliability. Databricks' managed services reduce infrastructure hassles, letting data teams focus on insight generation.
Why Move Data from Sybase Adaptive Server Enterprise (SAP ASE) into Databricks
Unlocking Advanced Analytics: Moving Data from Sybase ASE to Databricks
A data engineer or architect may wish to copy data from Sybase Adaptive Server Enterprise (ASE) into Databricks for several compelling reasons. Sybase ASE often contains important transactional or historical data that holds significant potential value when leveraged appropriately. However, the true utility of this data is often realized when it is integrated with information from other sources, enabling more comprehensive analytics, advanced data science, and richer business intelligence outcomes. By offloading the data from Sybase ASE into Databricks, organizations can perform complex data integration, transformation, and analysis tasks without impacting the performance or stability of the original Sybase ASE system. Databricks provides a scalable environment for processing large volumes of data, making it possible to extract deeper insights while ensuring the operational workload on Sybase ASE remains unchanged.
Similar connectors
Creating a User in Sybase Adaptive Server Enterprise (ASE)
This guide provides step-by-step instructions for creating a database user in Sybase ASE. Users must have appropriate administrative privileges (typically
saor database owner) to perform these actions.
1. Prerequisites
- You must have a valid Sybase ASE login (server-level identity).
- You need access to the target database where the user will be created.
2. Step 1: Create a Server Login (if needed)
Before creating a user in a database, ensure the login exists at the server level. If the login does not exist, create it using the following SQL:
-- Replace 'new_login' with the desired login name -- Replace 'secure_password' with a strong password sp_addlogin 'new_login', 'secure_password'
To verify if a login exists:
SELECT name FROM master..syslogins WHERE name = 'new_login'
3. Step 2: Create a Database User
Connect to the target database and map the server login to a database user. You may optionally specify a default database for the login.
``` -- Switch to the desired database USE target_database GO
-- Create a user mapped to the server login
-- Syntax: sp_adduser '
To verify the user was successfully added:
SELECT name FROM sysusers WHERE name = 'new_user'
4. Step 3: Grant Permissions (Optional)
Grant the required permissions/roles to the user as needed:
``` -- Grant SELECT privilege on a table to the new user GRANT SELECT ON dbo.some_table TO new_user GO
-- Alternatively, grant specific database roles -- Example: Add user to database owner group sp_adduser 'new_login', 'new_user', 'db_owner' ```
Notes
- The
sp_adduser
procedure must be executed from within the target database context. - Usernames and logins must conform to naming rules established in your environment.
- For more granular permissions, use the
GRANT
andREVOKE
statements.
Installing the JDBC Driver
The JDBC driver for Sybase Adaptive Server Enterprise (ASE) is necessary for integrating Sybase data sources with Matillion Data Productivity Cloud. However, due to licensing and redistribution restrictions, this driver is not distributed with Matillion by default and must be obtained and installed manually.
Follow the step-by-step instructions below to download and install the Sybase ASE JDBC driver for use with Matillion Data Productivity Cloud.
1. Download the Sybase JDBC Driver
-
Visit the SAP Sybase ASE product page:
https://www.sap.com/products/sybase-ase.html -
Locate the JDBC Driver:
Search the Downloads or Resources section of the above page for the JDBC driver. To ensure optimal compatibility and performance, download a Type 4 JDBC driver, which is a pure Java driver and does not require native libraries. -
Obtain the Required Files:
Download the JDBC driver package (most commonly distributed as a JAR file, such asjconn4.jar
).
2. Install the JDBC Driver in Matillion Data Productivity Cloud
Because the Sybase ASE JDBC driver is not bundled with Matillion Data Productivity Cloud, you must upload it manually following the platform’s guidelines.
-
Review External Driver Upload Instructions:
Detailed steps for uploading external JDBC drivers are provided in the Matillion documentation:
Uploading External Drivers -
Follow These Steps:
- Access Matillion Data Productivity Cloud as an administrator.
- Navigate to the appropriate section to manage or upload JDBC drivers.
- Upload the
jconn4.jar
(or equivalent driver JAR) file, ensuring it is correctly associated with the Sybase database connector or your Data Productivity Agent.
Note: Always refer to the latest Matillion documentation to ensure you are following the correct procedure for your version of the platform.
3. Using the Driver
After uploading and installing the JDBC driver, you are ready to use it in your Matillion Data Productivity Cloud projects. For guidance on configuring and querying the Sybase database connection:
- See the Matillion usage instructions:
Database Query Usage in Designer
This documentation provides detailed steps for setting up your data source, configuring connections, and performing database operations using your custom-installed JDBC driver.
Checking network connectivity
To enable connectivity between Matillion Data Productivity Cloud and your Sybase Adaptive Server Enterprise database, you must ensure that the database allows incoming connections according to your deployment configuration:
-
Full SaaS Agent Configuration:
Make sure that your Sybase Adaptive Server Enterprise database allows incoming connections from the set of public IP addresses used by Matillion. You can find the latest list of these IP addresses at this link. -
Hybrid SaaS Deployment:
In a Hybrid SaaS deployment, you should configure your Sybase Adaptive Server Enterprise database to permit incoming connections from the IP address range of your own virtual private cloud (VPC). Useful tools and utilities to help you check or validate network access are available at this link.
Additionally, if your Sybase Adaptive Server Enterprise database is referenced by a DNS hostname rather than a direct IP address, ensure that the Full SaaS or Hybrid SaaS agents are able to successfully resolve this DNS address to maintain successful connectivity.
Querying Data from Sybase Adaptive Server Enterprise (ASE)
This guide covers essential patterns for querying data from a Sybase Adaptive Server Enterprise database, particularly when transferring data for use in platforms like Databricks. It includes example SQL SELECT statements, a discussion of datatype conversion, and best practices for initial and incremental data loads.
Example Sybase ASE SQL SELECT Statements
The following examples illustrate common usages of the
SELECTstatement in Sybase ASE:
``` -- Select all columns from a table SELECT * FROM sales.orders;
-- Select specific columns SELECT order_id, customer_id, order_date FROM sales.orders;
-- Filtering records with WHERE clause SELECT * FROM sales.orders WHERE order_status = 'Shipped';
-- Selecting rows between date ranges SELECT * FROM sales.orders WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';
-- Aggregation example SELECT customer_id, COUNT(*) as total_orders FROM sales.orders GROUP BY customer_id; ```
Datatype Conversion: Sybase ASE to Databricks
When transferring data from Sybase ASE to Databricks, automatic datatype conversion may occur. Common examples include:
VARCHAR
/CHAR
→STRING
INT
/SMALLINT
/BIGINT
→ equivalent integer typesDATETIME
/SMALLDATETIME
→TIMESTAMP
FLOAT
/REAL
→DOUBLE
Review datatype mappings to prevent data quality issues and ensure compatibility between source and target platforms.
Data Loading Patterns: Initial and Incremental Loads
For loading Sybase ASE data into Databricks, the optimal approach consists of:
- Once-off Initial Load: Migrating all existing records from the source system.
- Incremental Loads: Periodically transferring only new or updated records.
Both patterns utilize the same Database Query component. However, the application of filters in the SQL queries differs.
Initial Load Example
The initial load captures all table data with no filtering:
SELECT * FROM sales.orders;
Configure the Database Query component without a
WHEREclause for the initial extraction.
Incremental Load Example
Subsequent (incremental) loads transfer only new or changed records. This relies on filtering, typically using a column like
last_modifiedor
order_date. For maximum performance and correctness, ensure that filter columns are indexed and reliably maintained.
SELECT * FROM sales.orders WHERE last_modified > '2024-06-01 00:00:00';
Update the filter condition for each load, corresponding to the latest value already extracted.
For detailed guidelines, refer to the Incremental Load Data Replication Strategy.
Note: Use consistent Database Query component settings for both approaches, modifying only the
WHEREclause to switch between initial and incremental loads.
Data Integration Architecture
Loading data in advance of integration is a classic way to divide and conquer data engineering challenges, as it neatly separates the loading (ingestion) phase from the transformation and integration phase. This strategy, a key feature of the ELT (Extract, Load, Transform) architecture, offers several advantages. Firstly, it allows teams to focus on loading raw data into the Databricks environment as a discrete, manageable step. Once the data is available in the Databricks database, integration and necessary transformations can proceed independently using robust data transformation pipelines—these pipelines facilitate scalable, automated, and auditable workflows for cleaning, enriching, and joining data from multiple sources. Another major benefit of the ELT pattern is that all data transformation and integration occur inside the target Databricks database itself. This means organizations benefit from fast, on-demand, and highly scalable processing capabilities, without the overhead or expense of maintaining separate data processing infrastructure outside Databricks.