Integrate data from Sybase SQL Anywhere to Amazon Redshift using Matillion

Our Sybase SQL Anywhere to Redshift connector enables seamless and timely data transfers to Redshift, ensuring your data remains current—without requiring manual coding or complicated ETL processes.

Sybase SQL Anywhere
Amazon Redshift
Sybase SQL Anywhere to Amazon Redshift banner

Extracting data from Sybase SQL Anywhere to Amazon Redshift

Extracting data from Sybase SQL Anywhere is a critical task for organizations looking to leverage their existing operational data within Amazon Redshift for advanced analytics and reporting. Successfully moving data from Sybase SQL Anywhere to Redshift involves a series of technical preparatory steps and careful planning to ensure data integrity and efficiency. In this article, we will guide you through the essential stages required for a smooth data extraction and loading process. First, we will describe how to create an appropriate identity (user) in Sybase SQL Anywhere for data extraction. For Matillion ETL users, we will explain how to verify and, if necessary, acquire the correct JDBC driver to enable connectivity. We will then discuss the network connectivity requirements to enable secure and reliable communication between your source and target systems. Finally, we will provide strategies for querying your data—covering both initial full extracts and methods for incremental data loading. By the end of this article, you will have a clear understanding of the steps and best practices involved in transferring your Sybase SQL Anywhere data into Amazon Redshift.


What is Sybase SQL Anywhere?

Sybase SQL Anywhere, now a product of SAP, is a relational database management system engineered for embedded, mobile, and remote environments, as well as traditional enterprise use cases. Distinguished by its small footprint, cross-platform support, and self-management capabilities, SQL Anywhere is optimized for decentralized data architectures where reliable performance with minimal administrative overhead is essential. It supports robust synchronization mechanisms—such as MobiLink—for bi-directional data exchange with centralized databases, making it particularly suitable for field applications, point-of-sale systems, or edge computing scenarios. Comprehensive support for SQL, high availability features, encryption, and extensive connectivity options underscore its utility in both on-premises and distributed cloud-based architectures.

matillion logo x Sybase SQL Anywhere

What is Amazon Redshift?

Amazon Redshift is a fully managed, petabyte-scale AWS data warehouse designed for fast analytics using columnar storage and massively parallel processing (MPP). It tightly integrates with AWS, enabling easy data loading from S3, DynamoDB, and relational databases via Glue or JDBC/ODBC. Redshift uses result caching, zone maps, compression, and vacuuming to boost query speed and cut costs. Redshift Spectrum lets users run SQL directly on exabyte-scale data in S3 without loading it first. Features like workload management, auto-scaling (RA3 nodes), and strong security make Redshift popular for business intelligence, reporting, and data lake analytics.

Why Move Data from Sybase SQL Anywhere into Amazon Redshift

Unlocking Analytical Potential: Transferring Data from Sybase SQL Anywhere to Amazon Redshift

Sybase SQL Anywhere often holds business-critical data accumulated from operational applications, making it a potentially valuable resource for analytics and reporting. Copying this data into Amazon Redshift enables integration with information from other sources—such as customer relationship management platforms, web analytics, and financial systems—which can reveal deeper insights than siloed analysis. By leveraging Redshift for this data integration and analytical processing, organizations can offload the heavy computational and query workloads from Sybase SQL Anywhere. This approach preserves the performance of transactional systems while utilizing Redshift’s scalable infrastructure to enable complex queries and large-scale data processing.

Creating a User in Sybase SQL Anywhere

This guide outlines the steps for creating a new user in a Sybase SQL Anywhere database via SQL statements.

Prerequisites

  • DBA (or appropriate administrative) privileges on the target database
  • Database connection via your preferred client (e.g., Interactive SQL, dbisql, or another SQL interface)

Step 1: Connect to the Database

Ensure you are connected to your SQL Anywhere database with a user account that has the necessary permissions to create other users.

-- Example connection using Interactive SQL:
-- (This is conceptual—actual connection strings will vary)
CONNECT TO "database_name" USER "DBA" IDENTIFIED BY 'password';

Step 2: Create the User

Use the

CREATE USER
statement as follows, replacing
new_user
and
new_password
with your desired username and password.

CREATE USER new_user IDENTIFIED BY 'new_password';

  • new_user
    : The name of the user to create (follow naming rules).
  • new_password
    : The password for the user.

Example:

CREATE USER reporting_app IDENTIFIED BY 'Rep0rt!ng#2024';

Step 3: Set User Permissions (Optional)

By default, the new user does not have access to database objects. Grant the appropriate permissions according to your application's requirements.

``` -- Grant user the ability to connect GRANT CONNECT TO reporting_app;

-- Grant specific object privileges (example) GRANT SELECT ON Sales TO reporting_app; ```

Additional Notes

  • To create a user who can authenticate from the operating system or from LDAP, consult the SQL Anywhere documentation on external logins.
  • To assign roles or administrative privileges, use the
    GRANT
    statement:

-- For example, to grant DBA authority (use with caution!)
GRANT DBA TO reporting_app;

Replace the placeholders in the examples with your actual database, username, password, and object names.

Installing the JDBC driver

The JDBC driver for Sybase SQL Anywhere is not included by default in Matillion Data Productivity Cloud. Due to licensing and redistribution restrictions, it must be obtained and installed manually before establishing database connections within your environment. Follow the steps below to download and install the appropriate driver in Matillion Data Productivity Cloud.

1. Download the JDBC driver

  • Visit the official SAP Sybase SQL Anywhere product page.
  • Locate the JDBC drivers section and search for the Type 4 JDBC driver for Sybase SQL Anywhere. The Type 4 driver is a pure Java implementation and is preferred for most use cases due to its portability.
  • Download the most recent stable version of the Type 4 JDBC driver to your local system, in the format provided (typically a JAR file).

Note: You may need to create an SAP account or agree to licensing terms before downloading the driver.

2. Install the JDBC driver into Matillion Data Productivity Cloud

To upload and install the Sybase SQL Anywhere JDBC driver into your Matillion Agent, reference the following official instructions: Uploading external drivers.

  • Ensure that you have the appropriate permissions to upload external JDBC drivers to your Matillion instance.
  • Follow the documented procedure to upload the Sybase SQL Anywhere JDBC driver JAR file you downloaded in the previous step.
  • Complete all required configuration steps to make the driver available to your Matillion users and workflows.

3. Next Steps—Driver Usage

Once the driver installation is complete, you can connect to your Sybase SQL Anywhere database using Matillion’s built-in Database Query components. For detailed instructions on configuring and establishing database connections, see the official guide: Using the Database Query component.

By following these steps, you can integrate Matillion Data Productivity Cloud with your Sybase SQL Anywhere data sources.

Checking network connectivity

To ensure successful connectivity between Matillion Data Productivity Cloud and your Sybase SQL Anywhere database, it is essential to properly configure your network to accept incoming connections. Depending on your deployment type, follow the guidelines below:

Full SaaS Agent Configuration - You must allow incoming connections to your Sybase SQL Anywhere database from the IP addresses listed in the following documentation: Matillion Public IP Addresses. - These IP addresses represent the outbound addresses used by the Full SaaS agent when accessing your database.

Hybrid SaaS Deployment - In a Hybrid SaaS configuration, allow incoming connections from your own virtual private cloud (VPC) where the Matillion Agent instances are running. - You can use utilities provided by Matillion to help check and validate network connectivity: Network Access Checker.

DNS Name Resolution - If you specify the Sybase SQL Anywhere host using a DNS address instead of a direct IP address, ensure that the Full SaaS or Hybrid SaaS agent is able to resolve the DNS name to the correct IP. Connectivity will fail if DNS resolution does not succeed.

Querying Data from Sybase SQL Anywhere Database

This guide explains how to query data from a Sybase SQL Anywhere database, with a particular focus on ETL scenarios that interact with Amazon Redshift. It also covers best practices for initial and incremental data loading patterns using ETL tools like Matillion.


Example Sybase SQL Anywhere SELECT Queries

Basic SELECT query:

SELECT emp_id, emp_name, hire_date
FROM employee;

SELECT with filter (used in incremental loading):

SELECT emp_id, emp_name, hire_date
FROM employee
WHERE hire_date > '2024-06-01';

SELECT with JOIN:

SELECT e.emp_id, e.emp_name, d.dept_name
FROM employee e
JOIN department d ON e.dept_id = d.dept_id;

SELECT with datatype conversion (cast as needed for compatibility with Redshift):

SELECT emp_id,
       emp_name,
       CAST(hire_date AS VARCHAR(19)) AS hire_date_str
FROM employee;


Data Type Considerations: Sybase SQL Anywhere to Redshift

When transferring data from SQL Anywhere to Amazon Redshift, datatype conversions may be necessary. Key points:

  • CHAR/VARCHAR: These are generally compatible, but check for maximum length constraints.
  • DATETIME/TIMESTAMP: Redshift supports
    TIMESTAMP
    ; ensure Sybase
    DATETIME
    columns are converted if needed.
  • NUMERIC/DECIMAL: These types are compatible, but verify scale and precision.
  • BINARY/VARBINARY: These may require encoding/decoding steps, as Redshift support is limited.

Example (converting DATETIME to VARCHAR for Redshift compatibility):

SELECT CAST(hire_date AS VARCHAR(19)) AS hire_date
FROM employee;


Loading Patterns: Initial and Incremental Loads

The recommended ETL pattern is to perform a once-off initial load followed by incremental loads. The same Database Query component can be used for both cases in your ETL pipeline (e.g., Matillion, Apache Airflow, custom scripts).

1. Initial Load

  • Purpose: Load all existing data from the source table.
  • Pattern: No filter in the
    WHERE
    clause.
  • Query Example:

SELECT emp_id, emp_name, hire_date
    FROM employee;

2. Incremental Load

  • Purpose: Only load new or changed records since the last extraction.
  • Pattern: Use a
    WHERE
    clause to filter new data based on a high-water mark (e.g., a timestamp or incrementing ID).
  • Query Example:

SELECT emp_id, emp_name, hire_date
    FROM employee
    WHERE hire_date > ?  -- Replace '?' with the latest value from the previous load

Further reading: Incremental Load & Data Replication Strategy (Matillion Exchange)

Use declarative parameters or scripting to substitute the filter value dynamically during incremental loads.


Note:
Always validate datatypes and NULL value handling when moving between Sybase SQL Anywhere and Amazon Redshift to avoid truncation, conversion errors, or data loss.

Data Integration Architecture

Loading data in advance of integration is a fundamental advantage of the ELT (Extract, Load, Transform) architecture, as it allows you to divide and conquer the data pipeline by splitting it into two distinct steps. First, data is loaded in its raw form into Amazon Redshift, making it immediately available for downstream processing. Data integration itself inherently requires transformation to ensure consistency, quality, and suitability for analytics, and the most effective means of achieving this is to employ well-designed data transformation pipelines. With the ELT approach, both transformation and integration are executed directly within the Redshift database. This not only ensures that the process is fast, highly scalable, and available on-demand, but it also eliminates the need to invest in or maintain separate, external processing infrastructure, resulting in both operational efficiency and cost savings.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.