Integrate data from Firebird SQL to Databricks using Matillion

Our Firebird SQL to Databricks connector enables seamless data transfer to Databricks within minutes, ensuring your information remains current without the need for manual coding or complex ETL processes.

Firebird
Databricks
Firebird SQL to Databricks banner

Extracting data from Firebird SQL to Databricks

Extracting data from Firebird SQL is a common requirement for organizations seeking to leverage advanced analytics platforms like Databricks. Whether you are consolidating legacy data, powering dashboards, or supporting data science initiatives, efficiently moving data from Firebird SQL to Databricks is a key step in the modern data pipeline. This article will guide you through the essential steps for such an integration. We begin by demonstrating how to create an identity (user) in Firebird SQL, ensuring secure and appropriate access to your data source. For those using Matillion, we explain how to check for an existing JDBC driver or acquire a compatible one, which is necessary for connecting to Firebird. Next, we discuss considerations and best practices for establishing network connectivity between Firebird (the source system) and Databricks (the target platform), an essential task for seamless data transfer. Finally, we walk through querying data in two distinct stages: extracting an initial full dataset and then moving to incremental querying to capture new or changed records over time. By the end of the article, you will be equipped with a practical approach for integrating Firebird SQL data into your Databricks environment.


What is Firebird SQL?

Firebird SQL is an open-source, cross-platform relational database management system (RDBMS) known for its lightweight footprint, strong SQL compliance, and high performance on Windows, Linux, and macOS. Forked from Borland’s InterBase in 2000, Firebird features multi-version concurrency control (MVCC), stored procedures, triggers, and robust transaction support. Its small size and low resource needs suit both embedded and enterprise applications. Firebird offers native APIs, ODBC/JDBC connectivity, third-party tool support, and is licensed under the Initial Developer’s Public License (IDPL), ensuring continued community-driven development and accessibility.

matillion logo x Firebird

What is Databricks?

Databricks is a unified analytics platform built on Apache Spark, offering a scalable, cloud-based solution for big data and advanced analytics. It integrates with various data sources—including data lakes and other storage—and streamlines ETL, real-time data ingestion, and collaborative data science workflows. Its managed Lakehouse architecture merges the reliability of data warehouses with data lake scalability, supporting SQL analytics, machine learning, and streaming in one platform. Databricks uses native Delta Lake support for ACID transactions, scalable metadata, and data versioning, ensuring high performance and reliability for diverse enterprise needs.

Why Move Data from Firebird SQL into Databricks

Unlocking Analytics: Copying Data from Firebird SQL to Databricks for Enhanced Insights

A data engineer or architect may wish to copy data from Firebird SQL into Databricks because Firebird SQL often contains business-critical or potentially valuable information accumulated over time. However, the true potential of this data emerges when it is integrated with other organizational data sources, such as customer analytics platforms, sales systems, or external datasets. By transferring the data into Databricks, which is specifically designed for large-scale data integration and advanced analytics, organizations can unify diverse datasets to derive actionable insights. Additionally, performing data integration and analysis within Databricks, rather than directly on the Firebird SQL server, helps to minimize the computational and I/O workload on the Firebird system, ensuring that its performance and reliability for operational processes remain unaffected. This approach supports scalable analytics while preserving the integrity and responsiveness of critical transactional systems.

Creating a User in Firebird SQL

This guide explains how to create a new user (identity) within a Firebird SQL environment. In Firebird 2.5 and above, user management is typically performed using the

CREATE USER
SQL statement or the command-line administration tool.

Prerequisites

  • Administrative access: You must connect as a user with sufficient privileges (typically
    SYSDBA
    ).

Using the SQL
CREATE USER
Statement (Firebird 3 and newer)

Firebird 3.0 introduced SQL-standard user management statements. To create a new user, connect to a database as

SYSDBA
and execute:

CREATE USER new_user_name
  PASSWORD 'StrongPassword'
  FIRSTNAME 'John'
  MIDDLENAME 'M'
  LASTNAME 'Doe'
;

Notes: - Only

CREATE USER ... PASSWORD '...'
is mandatory. Name fields are optional. - Custom user management via SQL is not supported in Firebird 2.5 and earlier. In those cases, use command-line tools (see below).

Using the Command-Line Tool (
gsec
)

For Firebird 2.5 and earlier, or for servers managing users externally, use the

gsec
tool.

  1. Open a terminal (command prompt).
  2. Run
    gsec
    (Firebird’s security manager).

gsec -user sysdba -password your_sysdba_password

  1. At the
    GSEC>
    prompt, create the new user:

add new_user_name -pw StrongPassword

Additional options such as `-fname`, `-lname`, etc., can be included as needed.
  1. Quit
    gsec
    :

quit

Granting Privileges

After creating a user, log in as a privileged user and grant the necessary access. For example, to grant

SELECT
privilege on the
EMPLOYEE
table:

GRANT SELECT ON EMPLOYEE TO new_user_name;

Tip: Modify privilege statements as appropriate for your application and security requirements.

Additional Resources

Installing the JDBC Driver

At the time of writing, the Firebird SQL JDBC driver is not bundled with Matillion Data Productivity Cloud by default. This is due to licensing or redistribution restrictions associated with the driver's distribution. If you need to connect to a Firebird SQL database from your Matillion instance, you’ll need to manually download and install the JDBC driver.

Downloading the Firebird JDBC Driver

  1. Navigate to the official Firebird JDBC driver page:
    https://www.firebirdsql.org/en/jdbc-driver/

  2. Choose the correct driver type:
    Look specifically for the “Type 4” JDBC driver, as this is a pure Java implementation and does not require native libraries. This driver is typically distributed as a JAR file.

  3. Download the latest compatible version of the Type 4 driver for your Firebird server version and intended use.

Installing the JDBC Driver in Matillion Data Productivity Cloud

  1. Refer to the Matillion documentation on uploading and installing external JDBC drivers:
    https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/

  2. Follow the instructions to upload the downloaded JDBC JAR file to your Matillion instance or agent as described. This process typically involves accessing the Agent or System Administration section, and using the user interface to upload the driver JAR file so it becomes available for use.

  3. Restart services or agents if prompted, to ensure the newly supplied driver is recognized.

Configuring and Using the Driver

  • Detailed usage instructions—including how to configure database queries and connections using the new driver—are available here:
    https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/

  • Ensure you reference the newly uploaded Firebird driver during database connection or query configuration.

By following these instructions, you’ll integrate the Firebird SQL JDBC driver into your Matillion environment and enable connectivity for your data loading and transformation workflows.

Checking network connectivity

Before connecting Matillion Data Productivity Cloud to your Firebird SQL database, you must ensure that the database is configured to accept incoming network connections from the appropriate sources. The required configuration depends on your deployment type:

Additionally, if the Firebird SQL database is referenced using a DNS hostname (rather than a static IP address), ensure that the Full SaaS or Hybrid SaaS agent can successfully resolve the database's DNS address. Proper DNS resolution is required for Matillion to establish a successful connection.

Querying Data from a Firebird SQL Database

This guide provides technical instructions and examples for querying data from a Firebird SQL database, with considerations for data type conversion and best practices for handling initial and incremental loads—such as in ELT jobs using Matillion or other data replication tools.


Example Firebird SQL SELECT Statements

Here are some common Firebird SQL queries for data extraction:

``` -- Select all columns from a table SELECT * FROM EMPLOYEE;

-- Select specific columns SELECT EMP_NO, FIRST_NAME, LAST_NAME, HIRE_DATE FROM EMPLOYEE;

-- Apply filters using WHERE clause SELECT * FROM SALES WHERE ORDER_DATE >= '2024-01-01';

-- Aggregate results SELECT DEPARTMENT, COUNT(*) AS TOTAL FROM EMPLOYEE GROUP BY DEPARTMENT;

-- Use joins between tables SELECT e.FIRST_NAME, e.LAST_NAME, d.DEPARTMENT_NAME FROM EMPLOYEE e JOIN DEPARTMENT d ON e.DEPARTMENT_ID = d.DEPARTMENT_ID; ```


Datatype Conversion: Firebird SQL to Databricks

When extracting data from Firebird and loading into Databricks (often via Spark or cloud data warehouses), implicit datatype conversion may occur. For example:

Firebird SQL Type Common Databricks Equivalent
INTEGER INT
VARCHAR(255) STRING
TIMESTAMP TIMESTAMP
DECIMAL(15,2) DECIMAL(15,2)
BLOB BINARY (or STRING with encoding)

Tip: Always review automatic datatype mapping and, if needed, cast data explicitly in your SELECT statement for greater control, e.g.:

SELECT 
  CAST(EMP_NO AS INT) AS EMP_NO, 
  CAST(HIRE_DATE AS DATE) AS HIRE_DATE
FROM EMPLOYEE;


Pattern: Initial and Incremental Data Loads

The best practice for orchestrating data extraction from Firebird SQL into analytic platforms such as Databricks is to perform:

  1. A once-off initial load: All qualifying data is extracted.
  2. Incremental loads: Only new or changed data is extracted in subsequent operations.

You can use a unified Database Query component (e.g., in Matillion) with adapted SQL for each phase.


1. Initial Load

  • No filter clause: Extract the entire data set.
  • Purpose: Full table load (usually at project start).

Example:

SELECT * FROM EMPLOYEE;
All data from
EMPLOYEE
is selected with no filtering.


2. Incremental Load

  • With filter clause: Only data updated or added since the last load is extracted.
  • Purpose: Ongoing, efficient replication.
  • Approach: Use a suitable watermark column, such as
    LAST_MODIFIED
    , to filter only new/changed rows.

Example:

SELECT * FROM EMPLOYEE
WHERE LAST_MODIFIED > '2024-06-15 23:59:59';
Replace the hard-coded timestamp with a run-time value, such as the last successful load timestamp.

For more detail on this pattern, visit the Matillion article:
Incremental Load (Data Replication Strategy)


Reminder: Always use the same Database Query component—with only the filtering SQL changing between initial and incremental loads—for consistency and ease of maintenance.

Data Integration Architecture

Loading data in advance of integration exemplifies the "divide and conquer" principle by splitting the process into two distinct steps: ingestion and transformation. This division is a key advantage of the ELT (Extract, Load, Transform) architecture, as it allows organizations to first load raw data into the Databricks database and then perform integration as a separate, subsequent step. Data integration itself requires comprehensive data transformation to standardize, clean, and bring disparate data sources together. The best practice for these operations is the use of data transformation pipelines, which automate and orchestrate complex transformation tasks efficiently. Furthermore, an important benefit of the ELT approach is that both transformation and integration are performed within the Databricks environment, eliminating the need for external processing infrastructure. This integration within the target database facilitates on-demand scalability and rapid processing, while also reducing costs associated with maintaining separate transformation platforms.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.