Extracting data from Exasol to Snowflake
Extracting data from Exasol is a common requirement for organizations seeking to consolidate or migrate their data assets to modern cloud warehouses such as Snowflake. This process involves several essential steps to ensure both security and efficiency. In this article, we will guide you through creating an identity in Exasol for authorized data extraction, and, for Matillion ETL users, how to check for or install the necessary JDBC driver. We will also discuss how to configure appropriate network connectivity between Exasol and Snowflake, ensuring reliable data transfers. Finally, we will cover methods for querying and loading both initial and incremental datasets, helping you maintain up-to-date information in your Snowflake environment. Whether you are undertaking a one-time migration or setting up a recurring data pipeline, these steps will provide a robust foundation for your integration efforts.
What is Exasol?
Exasol is a high-performance, in-memory, columnar relational database management system (RDBMS) optimized for high-throughput analytics and complex query workloads at scale. Designed specifically for analytical processing, Exasol leverages massively parallel processing (MPP) architecture and advanced compression techniques to deliver fast query performance, even on large volumes of data. It supports standard SQL, integrates with popular analytics and BI tools, and can connect to data sources through a variety of interfaces such as ODBC, JDBC, and REST APIs. Exasol is available as on-premise, cloud, or hybrid deployments, offering flexible scalability and features such as self-tuning, high availability, and robust security to meet the needs of modern data-driven enterprises.
What is Snowflake?
Snowflake is a cloud-native data platform designed for scalable data warehousing, data lakes, and engineering workloads. Its multi-cluster shared data architecture separates compute, storage, and cloud services for efficient, dynamic scaling. Snowflake supports structured and semi-structured data (e.g., JSON, Avro, Parquet) and offers high-performance ingestion and queries. With a SQL-based interface, strong security, and broad integration, it enables analytics, data sharing, and machine learning without managing hardware. Operating as a fully managed service on major clouds, Snowflake provides automatic tuning, elastic provisioning, and pay-per-use pricing to minimize administrative tasks and maximize flexibility.
Why Move Data from Exasol into Snowflake
The Benefits of Replicating Exasol Data into Snowflake for Advanced Analytics
A data engineer or architect may wish to copy data from Exasol into Snowflake for several strategic reasons. Exasol often contains data that is potentially valuable for analytics, reporting, and decision-making, but this value is fully realized only when the data is integrated with information from other enterprise sources. By consolidating Exasol data within Snowflake’s modern data platform, organizations can more effectively join, transform, and analyze datasets in conjunction with other sources, thereby uncovering deeper business insights. Employing Snowflake as the environment for data integration also offers operational advantages: it prevents additional computational workload on the Exasol system itself, ensuring Exasol continues to perform optimally for its primary workloads while leveraging Snowflake’s scalability and advanced processing capabilities for integration and analytics.
Similar connectors
Creating a User in Exasol
To create a new user in an Exasol database, you require sufficient administrative privileges (such as having the
CREATE USERsystem privilege). The following instructions and sample SQL scripts illustrate the process.
1. Basic User Creation
The fundamental statement for user creation is the
CREATE USERstatement. Replace the placeholders with your actual values:
CREATE USER username IDENTIFIED BY 'strong_password';
- username: The login name for the new user.
- strong_password: A secure password for the user.
Example:
CREATE USER dev_user IDENTIFIED BY 'S3cureP@ssw0rd!';
2. Granting Privileges
After creating the user, you typically grant necessary system and/or object privileges. For example, to allow a user to create sessions (log in) and work with a specific schema:
GRANT CREATE SESSION TO dev_user; GRANT SELECT, INSERT, UPDATE ON schema_name.* TO dev_user;
Adjust the privileges to fit your use case.
3. Additional User Configuration (Optional)
For more controlled user management, you can:
-
Set a password expiry date:
CREATE USER temp_user IDENTIFIED BY 'T3mpPassw0rd!' PASSWORD EXPIRE;
-
Restrict user by IP address:
ALTER USER dev_user LIMIT CONNECT TO '192.168.10.100';
4. User Deletion (If Needed)
To remove an existing user:
DROP USER dev_user;
This command irrevocably deletes the user and cannot be undone.
Installing the JDBC Driver
To connect Matillion Data Productivity Cloud to an Exasol database, it is necessary to install the Exasol JDBC driver manually. As of the time of writing, Matillion does not bundle the Exasol JDBC driver by default. This is due to licensing or redistribution restrictions imposed by the driver’s publisher. The following instructions outline the process for downloading and installing the JDBC driver, as well as referencing further documentation for installation and usage within Matillion.
1. Download the Exasol JDBC Driver
-
Go to the official Exasol JDBC driver download page:
https://www.exasol.com/portal/display/DOWNLOADS/JDBC+Drivers -
On this page, look for a Type 4 JDBC driver (sometimes referred to as a "pure Java" driver). This is the most commonly supported and recommended type for use with data integration tools such as Matillion. Download the appropriate JAR file for your Matillion Agent environment.
2. Upload the Driver to Matillion
Install the JDBC driver into your Matillion Data Productivity Cloud deployment following Matillion’s procedure for adding external drivers. Detailed instructions are available in the Matillion documentation:
You’ll need to log in to your Matillion environment and follow the steps to upload and make the JAR file available to relevant agents or integrations.
3. Connect to Exasol
After uploading the driver, you can now connect to your Exasol database from within Matillion. For comprehensive details on configuring your database queries and leveraging the new connection, consult the following guide:
By ensuring the Type 4 JDBC driver is available in your instance, you enable Matillion Data Productivity Cloud to establish secure and direct connections to your Exasol databases.
Checking network connectivity
To successfully connect Matillion Data Productivity Cloud to your Exasol database, you must ensure your database accepts incoming network connections from the appropriate sources, depending on your deployment configuration:
-
Full SaaS agent configuration:
The Exasol database must allow inbound connections from the static IP addresses used by Matillion’s cloud agents. You can find the up-to-date list of required IP addresses here: Matillion Security Documentation. -
Hybrid SaaS deployment:
In this setup, the Exasol database should permit incoming connections from the IP ranges of your own virtual private cloud (VPC), where your private Hybrid SaaS agent is running. To test and validate network accessibility, utilities are available at: Matillion Network Access Checker.
Additionally, if you are referencing your Exasol database by a DNS hostname, make sure that the Full SaaS or Hybrid SaaS agent can resolve the DNS address. This typically requires appropriate DNS configuration and firewall rules allowing DNS lookup requests.
Querying Data from Exasol Database
This guide describes how to query data from an Exasol database, including recommended loading strategies and important considerations for datatype compatibility with Snowflake.
Example Exasol SQL SELECT Statements
You can issue standard SQL queries against Exasol. Here are some typical query patterns:
``` -- Selecting all columns from a table SELECT * FROM sales.orders;
-- Selecting specific columns with a WHERE clause filtering by a date SELECT order_id, customer_id, amount FROM sales.orders WHERE order_date >= DATE '2024-01-01';
-- Joining tables SELECT o.order_id, c.customer_name, o.amount FROM sales.orders o JOIN sales.customers c ON o.customer_id = c.customer_id WHERE o.order_status = 'SHIPPED';
-- Aggregating data SELECT customer_id, COUNT(*) AS total_orders, SUM(amount) AS total_spent FROM sales.orders GROUP BY customer_id; ```
Datatype Conversion: Exasol and Snowflake
When extracting data from Exasol for use in Snowflake, be aware of datatype differences—automatic conversion may occur. For example:
| Exasol Datatype | Snowflake Equivalent |
|---|---|
| VARCHAR, CHAR | VARCHAR |
| DECIMAL, DOUBLE, FLOAT | NUMBER, FLOAT |
| DATE, TIMESTAMP | DATE, TIMESTAMP_NTZ |
| BOOLEAN/BOOL | BOOLEAN |
| GEOMETRY, INTERVAL | Not natively supported |
Recommendation: Always review and test your ETL pipeline to ensure data is converted as expected, especially for date/time and floating-point types.
Best Practice: Initial and Incremental Loads
The recommended pattern is: - Initial (Full) Load: Extract all relevant data with no filter. - Incremental Load: Afterwards, extract only data that newly appeared or recently changed.
Using the Database Query Component
You can use the Database Query component for both initial and incremental loads by configuring the SQL statement differently:
Initial Load Example
No filter clause—extracts the entire table:
SELECT * FROM sales.orders;
Incremental Load Example
With a filter for only newly added/changed rows, e.g., based on a timestamp column:
SELECT * FROM sales.orders WHERE order_date > '2024-06-01 00:00:00';
For a comprehensive guide on incremental loading and data replication strategy, see: Matillion Incremental Load Strategy
Note: In both load types, you are able to reuse the same Database Query component, simply modifying the SQL to include or exclude the appropriate filter.
By following these principles, you can efficiently query and transfer data from Exasol to Snowflake in a robust, scalable manner.
Data Integration Architecture
In a modern data integration workflow, loading data in advance of integration—known as the Extract, Load, and Transform (ELT) architecture—enables organizations to divide and conquer the process by splitting it into two distinct steps: first loading raw data into the target database, and then integrating and transforming that data as needed. This approach is particularly advantageous because it allows data transformation—which is critical for integration—to be executed within robust data transformation pipelines. These pipelines automate and orchestrate the sequence of operations needed to clean, standardize, and join data from multiple sources. By leveraging ELT, both the transformation and integration steps occur inside the target Snowflake database, delivering additional benefits such as rapid, on-demand scalability and reduced infrastructure costs; there is no need for separate data processing platforms or servers, as all computation is performed natively within Snowflake’s highly performant environment.