Extracting data from Actian Ingres to Databricks
Extracting data from Actian Ingres is a key activity for organizations looking to leverage cloud analytics platforms such as Databricks. By moving data from Ingres into Databricks, teams can unlock deeper insights and more scalable analytics workflows. This article will guide you through the essential steps of this process. We will begin by creating an identity in Actian Ingres to facilitate secure and auditable access. For those using Matillion as an ETL tool, we will outline how to verify or obtain the appropriate JDBC driver necessary for connecting to Ingres. To ensure a smooth transfer, we will cover best practices for establishing reliable network connectivity between your Ingres source and Databricks target environments. Finally, the article will walk you through the process of querying and loading your data—starting with an initial full extract, and progressing to strategies for efficient incremental data loading. Whether you are beginning your data migration journey or optimizing existing pipelines, the following sections will provide clear guidance to efficiently move your data from Actian Ingres into Databricks.
What is Actian Ingres?
Actian Ingres is a relational database management system (RDBMS) designed for enterprise-grade applications requiring high reliability, data integrity, and concurrent transactional performance. Originating from the Ingres project at the University of California, Berkeley, Ingres has evolved to support modern SQL standards, ACID-compliant transactions, and sophisticated security models. It offers features such as multi-version concurrency control (MVCC), powerful optimization for complex analytical queries, and robust enterprise connectivity, including support for JDBC, ODBC, and .NET clients. With its efficient storage engine and administrative tools, Actian Ingres is deployed across industries for mission-critical OLTP and reporting workloads, particularly where open-source flexibility and low total cost of ownership are essential.
What is Databricks?
Databricks is a cloud-based data platform featuring Databricks SQL Warehouses, which offer scalable managed data warehousing and advanced data lake capabilities on Apache Spark. Leveraging Delta Lake, it enables ACID transactions, schema enforcement, and time travel on cloud storage like AWS S3, Azure, and Google Cloud. By decoupling storage and compute, Databricks delivers high-performance analytics, streaming, and supports both SQL and programmatic interfaces for data engineering, machine learning, and business intelligence. It provides advanced governance, collaborative workspace management, and integrates with open formats like Parquet and Delta, making it ideal for enterprises unifying analytics and AI workloads at scale.
Why Move Data from Actian Ingres into Databricks
Unlocking Advanced Analytics: The Benefits of Transferring Data from Actian Ingres to Databricks
A data engineer or architect may choose to copy data from Actian Ingres into Databricks for several compelling reasons. Actian Ingres often contains business-critical data that holds significant potential value when leveraged correctly. By integrating this data with information from other sources within Databricks, organizations can uncover new insights and produce more comprehensive analytics, thereby maximizing the utility of their data assets. Furthermore, performing the integration and analytics within the Databricks environment helps to minimize the operational load on the Actian Ingres system, ensuring that transactional performance and stability are not compromised while still making use of the underlying data for advanced analytics and data-driven decision making.
Similar connectors
Creating a User in Actian Ingres Database
This guide outlines the steps required to create a new user identity for Actian Ingres. Ingres relies on operating system (OS) user accounts for database authentication; database users are not managed within Ingres itself, but instead at the OS level. Permissions within the database may then be managed for these users. For most deployments, follow the instructions below.
1. Create the Operating System User
Before a user can access an Ingres database, you need to create a corresponding OS user account on the database server.
On Linux/Unix
sudo useradd -m newuser sudo passwd newuserReplace
newuserwith the desired username. Set the user's password when prompted.
On Windows
Create a user using the Control Panel or the following command in Command Prompt (as an administrator):
cmd net user newuser strong_password /addReplace
newuserand
strong_passwordwith the appropriate username and secure password.
2. Grant Database Permissions
After creating the OS user, assign specific database permissions by logging into Ingres as a DBA (such as
ingresor an OS user with DBA rights).
sql dbnameReplace
dbnamewith your actual database name.
Example SQL: Grant Privileges
``` -- Grant CONNECT privilege to the new user GRANT CONNECT TO newuser;
-- Optionally, grant further privileges such as table-level rights: GRANT SELECT, INSERT, UPDATE, DELETE ON tablename TO newuser; ```
Note: Replace
tablenamewith actual table names.
3. Verifying Access
Log out of your session and access Ingres with the new user credentials:
sql dbname -u newuserYou will be prompted for the password if OS authentication is required.
Notes
- By default, Ingres uses OS authentication. If your deployment uses database authentication (
DBMS_AUTHENTICATION
), consult your system documentation for alternative methods. - Additional privilege management, group assignments, and advanced authentication options can be implemented; refer to the official Actian Ingres documentation.
Installing the JDBC driver
The Actian Ingres JDBC driver is not bundled by default with Matillion Data Productivity Cloud, due to licensing or redistribution restrictions. To work with Ingres databases in Matillion, you must acquire and install the driver manually. This guide will walk you through the process.
1. Downloading the JDBC Driver
Go to the Actian homepage for Ingres:
https://www.actian.com/products/actian-ingres/
- On the product page, locate and download the Type 4 JDBC driver for Ingres. The Type 4 driver is a pure Java driver, which offers improved portability and doesn’t require native library dependencies, making it the preferred choice for deployment in cloud environments like Matillion.
- You will typically receive a
.jar
file for the JDBC driver.
2. Uploading the Driver to Matillion Data Productivity Cloud
Since Matillion does not provide the Actian Ingres JDBC driver out of the box, you must upload the downloaded
.jarfile to your Matillion agent environment.
- Refer to Matillion's official instructions here:
Uploading External Drivers - Follow the steps described in the documentation for uploading external JDBC drivers. Pay close attention to driver file compatibility, versioning, naming conventions, and where to place the
.jar
file relative to your agent deployment.
3. Configuring and Using the JDBC Driver
After uploading the driver, you can begin using it within Matillion Data Productivity Cloud to create database connections and perform queries.
- For step-by-step usage guidance, consult:
Using JDBC Drivers in Database Query Components - The documentation provides detailed information on how to reference the newly installed driver and configure connection parameters for your Ingres database within Matillion's user interface.
By following these steps and referencing the provided documentation links, you can integrate Ingres database access into your Matillion Data Productivity Cloud environment.
Checking network connectivity
To establish a connection between Matillion Data Productivity Cloud and your Actian Ingres database, you must ensure that your database allows incoming network traffic according to your deployment configuration:
-
Full SaaS Agent Configuration:
Allow incoming connections from the outbound IP addresses used by Matillion's SaaS platform. The required list of IP addresses is maintained and updated at: https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/. -
Hybrid SaaS Deployment:
Your Actian Ingres database needs to accept connections from the address range of your own virtual private cloud (VPC). To assist with checking network accessibility from your VPC, helpful utilities are available at: https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/.
If your Actian Ingres database instance is referenced using a DNS hostname rather than a static IP, ensure that your Full SaaS or Hybrid SaaS agent has appropriate DNS resolution capability for that address. This is required for the agent to correctly locate and connect to your database.
How to Query Data from an Actian Ingres Database
This guide explains how to run SQL queries against an Actian Ingres database, with considerations for datatype conversions to platforms like Databricks, and best practices for initial and incremental data loads using ETL tools such as Matillion.
Example: SQL SELECT Queries for Actian Ingres
Below are some examples of basic and advanced SQL queries you can run on an Actian Ingres database:
Simple SELECT
SELECT first_name, last_name, email FROM customer;
SELECT with WHERE Clause
SELECT order_id, order_date, total_amount FROM orders WHERE total_amount > 500.00;
Aggregation and Grouping
SELECT product_id, COUNT(*) AS total_sales
FROM sales
WHERE sale_date >= date('2024-01-01')
GROUP BY product_id;
Join Query
SELECT c.customer_id, c.first_name, o.order_id, o.order_date
FROM customer c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date > date('2024-01-01');
Datatype Conversion Concerns (Actian Ingres to Databricks)
When moving data from Actian Ingres to Databricks, automatic or manual datatype conversions may need to take place. Some typical patterns and concerns include:
- Date/Time Types:
- Ingres
date
,ansidate
, ortimestamp
types may need mapping to Databricks’DATE
orTIMESTAMP
. - Character Types:
- Ingres
char
orvarchar
become DatabricksSTRING
. - Numeric Types:
- Ingres
integer
,float
,decimal
types generally map well but be mindful of range and precision differences. - Boolean Types:
- Depending on schema, Ingres may use integer (
0/1
) or logical (TRUE/FALSE
) for booleans.
Test your extraction and transformation logic thoroughly to verify that data fidelity is maintained across platforms.
Initial and Incremental Data Loading Pattern
Recommended approach:
Use a Database Query component (such as that in Matillion) for both initial and incremental loads.
Initial Load
- Behavior: Load all available data (full table scan).
- Query Pattern: No
WHERE
filter is used—retrieves the entire dataset.
SELECT * FROM orders;
Incremental Load
- Behavior: Load only new or updated records since the last extraction.
- Query Pattern: Add a filter clause to select data based on a high-water mark (e.g.,
last_updated
timestamp or incrementing primary key).
SELECT * FROM orders WHERE last_updated > '2024-06-01 00:00:00';
Or, if using a numeric key:
SELECT * FROM orders WHERE order_id > 12345;
You can parameterize the value (
'2024-06-01 00:00:00'or
12345) to automate incremental extraction in your ETL workflow.
For further details, see: Incremental Load (Data Replication Strategy) – Matillion Exchange
Summary of Best Practice
- Same SQL query template (Database Query component) for both load types.
- No filter clause for initial/full load.
- Parameterize filter clause for incremental loads using reliable, monotonically increasing columns (timestamps or integer IDs).
- Ensure datatype compatibility before loading data to the destination (such as Databricks).
Data Integration Architecture
Loading data into your Databricks database before integrating it is an effective way to "divide and conquer" the challenges of data integration, by splitting the process into two manageable steps. This approach, characteristic of the ELT (Extract, Load, Transform) architecture, offers a major advantage: data is first ingested into the target system, and only then are the necessary transformations and integrations performed. Data integration typically requires transforming source data so it conforms with the target schema and business logic, and this is best accomplished using robust data transformation pipelines. Another significant benefit of the ELT architecture is that both transformation and integration tasks are executed inside the target Databricks database itself. This ensures that processing is fast, on-demand, and highly scalable, while also removing the need to invest in or manage external data processing infrastructure.