Extracting data from Sybase SQL Anywhere to Snowflake
Extracting data from Sybase SQL Anywhere is a key step in modernizing data architectures and enabling advanced analytics. As organizations increasingly rely on cloud data platforms like Snowflake, the ability to move data efficiently and reliably from legacy systems to the cloud has become essential. This article provides a practical guide to transferring data from Sybase SQL Anywhere into Snowflake, focusing specifically on workflow and configuration options relevant to Matillion users. We will begin by outlining how to create an identity in Sybase SQL Anywhere to ensure you have the appropriate permissions for data extraction. Next, we will walk through checking for the necessary JDBC driver in your Matillion environment and acquiring it if needed. Ensuring robust network connectivity between the Sybase source and the Snowflake target is also critical, and we will discuss key considerations in this area. Finally, the article will delve into strategies for querying data – starting with an initial extract and progressing to efficient incremental loads. By following these steps, you will be equipped to build a reliable data pipeline from Sybase SQL Anywhere to Snowflake.
What is Sybase SQL Anywhere?
Sybase SQL Anywhere, now maintained by SAP, is an embedded relational database management system ideal for mission-critical, mobile, remote, and embedded applications. It features robust SQL compliance, high availability, synchronization, automatic tuning, snapshot isolation, multi-threaded architecture, and encrypted storage. Supporting ODBC, JDBC, .NET, and native APIs, it integrates easily with various development frameworks and operating systems. Renowned for its small footprint, efficient resource use, and minimal administrative overhead, SQL Anywhere delivers reliable performance across diverse deployment scenarios, from desktops and servers to IoT devices.
What is Snowflake?
Snowflake is a cloud-native, fully managed data platform that separates storage and compute, enabling dynamic resource scaling for diverse workloads such as data warehousing, data lakes, and analytics. It leverages a multi-cluster shared data architecture, allowing multiple users and processes to access the same data without contention or performance degradation. Snowflake supports structured and semi-structured data (including JSON, Avro, and Parquet) and integrates with various data ingestion, transformation, and analysis tools. Security features like automatic encryption, role-based access control, and support for compliance standards are integral, while built-in features such as time travel, zero-copy cloning, and data sharing facilitate agility, efficient data management, and collaboration across teams and organizations.
Why Move Data from Sybase SQL Anywhere into Snowflake
Unlocking Analytical Insights: Copying Data from Sybase SQL Anywhere to Snowflake
There are several compelling reasons why a data engineer or architect might wish to copy data from Sybase SQL Anywhere into Snowflake. Firstly, Sybase SQL Anywhere often holds data that is potentially valuable for business intelligence, analytics, or operational reporting, but this value is only fully realized when it is accessible alongside other key datasets. Integrating data from Sybase SQL Anywhere with data from other enterprise systems in a unified environment such as Snowflake enables comprehensive analysis and more informed decision-making. Additionally, leveraging Snowflake for this integration process confers significant operational advantages; by offloading transformation and analysis workloads to Snowflake’s robust, scalable platform, organizations avoid placing additional strain on the production Sybase SQL Anywhere system. This approach helps maintain the performance and stability of the source database while maximizing the utility and accessibility of its data.
Similar connectors
Creating a User in Sybase SQL Anywhere
To add a new user to a Sybase SQL Anywhere database, use the
CREATE USERstatement. This grants the new user database access with the specified credentials.
Prerequisites
- You must be connected to the database as a user with sufficient privileges (typically, the DBA or a user with
CREATE USER
orDBA
authority).
Instructions
1. Connect to Your Database
Connect to your database using your client tool of choice (e.g., Interactive SQL, SQL Central, or via ODBC/JDBC).
-- Example connection string (adjust for your environment): CONNECT TO demo USER DBA IDENTIFIED BY sql;
2. Create the User
Run the following SQL script, replacing
usernameand
passwordwith your desired credentials:
CREATE USER username IDENTIFIED BY 'password';
Example
CREATE USER reporting IDENTIFIED BY 'Rep0rtPass!';
This statement creates a user called
reportingwith the password
Rep0rtPass!.
3. Grant Appropriate Permissions (Optional)
By default, a new user has minimal rights. Assign privileges as required. For example, to grant SELECT permission on a table:
GRANT SELECT ON table_name TO username;
Example
GRANT SELECT ON sales_data TO reporting;
4. (Optional) Grant DBA Authority
If the new user requires database administrator capabilities, use:
GRANT DBA TO username;
Example
GRANT DBA TO reporting;
Additional Notes
- Passwords are case-sensitive and can include special characters.
- You can review all users by querying
SYSUSER
:
SELECT * FROM SYSUSER;- To drop a user later:
DROP USER username;
Be sure to tailor privileges to the needs of each user to maintain database security.
Installing the JDBC driver
The Sybase SQL Anywhere JDBC driver is required for Matillion Data Productivity Cloud to connect to Sybase SQL Anywhere databases. At the time of writing, due to licensing and redistribution restrictions, this driver is not bundled with Matillion Data Productivity Cloud. Users must manually obtain and install the JDBC driver before SQL Anywhere integrations can be configured.
Downloading the JDBC driver
- Visit the official Sybase SQL Anywhere page:
https://www.sap.com/products/sybase-sql-anywhere.html - Locate the section for JDBC drivers. Look specifically for a Type 4 JDBC driver because it is a pure Java driver, platform-independent, and recommended for cloud-based integrations such as Matillion.
- Download the latest version of the Type 4 JDBC driver that matches your SQL Anywhere server version.
Note: Access to the download may require creating an SAP account and accepting relevant license agreements.
Preparing for installation
- After downloading, extract the JAR file(s) supplied as the Type 4 JDBC driver.
- Record the full path to the main JDBC driver JAR file. Additional dependency JARs, if any, must be gathered as well.
Uploading the driver to Matillion Data Productivity Cloud
- Follow the step-by-step instructions provided by Matillion for uploading external JDBC drivers:
https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/ - Use the Matillion Agent interface or method suitable for your deployment model.
- Upload the JDBC driver JAR file(s), ensuring all required files are included and correctly referenced.
- Restart or refresh the agent process as instructed to complete registration of the new driver.
Using the Sybase SQL Anywhere JDBC driver
Once the driver is installed, you can use it within Matillion Data Productivity Cloud for database connectivity and queries. Detailed instructions for configuring and using the driver with workflow components are available at:
https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/
These usage instructions cover setup of connections, parameter requirements, and best practices for leveraging the new JDBC driver within your Matillion workflows.
Checking network connectivity
To ensure successful connectivity between Matillion Data Productivity Cloud and your Sybase SQL Anywhere database, you must configure your network to allow incoming connections from the correct sources, depending on your deployment type:
-
Full SaaS agent configuration:
Ensure that your Sybase SQL Anywhere database allows incoming connections from the public IP addresses used by Matillion agents. The list of IP addresses to allow can be found at: https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/. -
Hybrid SaaS deployment:
You must allow incoming connections from resources within your own virtual private cloud (VPC) where the Matillion Hybrid SaaS agent is running. To assist with verifying network paths between your VPC and the target database, you can use the tools available at: https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/.
Additionally, if you are referencing the Sybase SQL Anywhere database by a DNS address (rather than by IP address), ensure that the Full SaaS or Hybrid SaaS agent you are using is able to resolve the DNS name to the correct IP address. This may involve managing DNS resolution settings or providing access to required name servers in your networking infrastructure.
Querying Data from Sybase SQL Anywhere
This guide describes how to extract data from a Sybase SQL Anywhere database using SQL queries. These instructions are relevant for users seeking to load data—initially or incrementally—into a destination such as Snowflake, possibly utilizing a tool such as Matillion, in which datatype conversions may occur.
Example SQL SELECT Statements
Below are sample SQL queries for extracting data from a Sybase SQL Anywhere database.
Simple SELECT all:
SELECT * FROM sales.orders;
Selecting specific columns:
SELECT order_id, customer_id, order_date, total_amount FROM sales.orders;
Using WHERE to filter rows (for incremental loads):
SELECT * FROM sales.orders WHERE order_date > '2024-01-01';
Selecting using JOIN:
SELECT o.order_id, c.customer_name
FROM sales.orders AS o
JOIN sales.customers AS c
ON o.customer_id = c.customer_id;
Datatype Conversion Considerations
When transferring data from Sybase SQL Anywhere to Snowflake, data types may require conversion. Examples:
- Sybase
VARCHAR
andCHAR
map to SnowflakeVARCHAR
. - Sybase
INT
,SMALLINT
map to SnowflakeNUMBER
. - Date/time types (
DATE
,TIME
,TIMESTAMP
) generally convert directly but verify compatibility. - Watch for differences in handling nulls, case sensitivity, or time zone data.
Always validate your mappings and test with sample data.
Loading Patterns: Initial vs. Incremental Loads
The preferred approach for integrating Sybase SQL Anywhere data is to run a once-off initial extraction, then perform regular incremental extracts using the same Database Query component with query variations as follows:
Initial Load
Execute the Database Query with no filter condition to extract the full dataset.
SELECT * FROM sales.orders;
Incremental Load
Add a filter clause restricting extraction to new or changed records only (for example, using a
last_updatedcolumn).
SELECT * FROM sales.orders WHERE last_updated > '2024-06-01 00:00:00';
Learn more about this approach and best practices in Matillion’s article on incremental-load data replication strategy.
By adapting the query's WHERE clause for incremental loads, you can ensure efficiency and data consistency across your integration process.
Data Integration Architecture
Loading data in advance of integration exemplifies the "divide and conquer" approach within the ELT (Extract, Load, Transform) architecture by breaking the process into two distinct steps: first loading raw data into the target Snowflake environment, and then performing necessary integrations and transformations. This separation allows each phase to be managed independently, enhancing flexibility and control. Data integration often involves extensive transformation, and the most effective way to manage these activities is through dedicated data transformation pipelines. These pipelines automate and orchestrate the transformation logic, ensuring reliable, repeatable processes. Additionally, a significant benefit of the ELT architecture is that all data transformation and integration tasks are performed directly in the Snowflake database. As a result, transformations are not only fast and scalable—leveraging Snowflake’s high-performance processing capabilities and elastic compute—but also cost-effective, as there is no need to invest in or maintain separate data processing infrastructure outside of your Snowflake environment.