Extracting data from Mimer SQL to Databricks
Extracting data from Mimer SQL is a critical step in enabling advanced analytics, reporting, and data integration. For organizations seeking to harness the scale and flexibility of Databricks, seamless data movement between source systems and cloud platforms must be both robust and efficient. This article provides a practical guide to transferring data from Mimer SQL to Databricks, outlining the necessary steps and considerations to ensure a reliable workflow. We will begin by detailing the process of creating an identity in Mimer SQL, which is essential for secure and controlled data access. For users working with Matillion, we’ll explain how to verify the presence of the required JDBC driver—or acquire it if needed—to enable communication between Matillion and the Mimer SQL database. Ensuring proper network connectivity between your Mimer SQL source and the Databricks target will also be covered, helping you avoid common connectivity pitfalls. Finally, the article will discuss strategies for querying data, including both initial full loads and incremental extraction, so that you can keep your target environment synchronized and up to date. Whether you are new to data integration or enhancing an established ETL pipeline, these step-by-step instructions will help you move your data confidently and efficiently.
What is Mimer SQL?
Mimer SQL is a high-performance, fully ACID-compliant relational database management system designed for both embedded and enterprise applications. Engineered for reliability, scalability, and minimal administrative overhead, it supports standard ANSI SQL and offers multimodal interfaces such as ODBC, JDBC, and ADO.NET for cross-platform integration. Mimer SQL’s architecture is optimized for real-time performance, with a focus on seamless multithreading and process isolation, making it well-suited for mission-critical workloads in fields like telecom, automotive, and healthcare. Additionally, its zero-maintenance, self-tuning features, and small footprint deployability position it as a robust solution for embedded systems, while still offering full compatibility for large-scale server implementations.
What is Databricks?
Databricks is a cloud-based analytics platform built on Apache Spark, offering unified big data processing, machine learning, and collaborative analytics. Its Databricks Lakehouse merges data warehouse and data lake features, storing structured and unstructured data using Delta Lake for ACID compliance and scalable metadata. Supporting SQL, Python, Scala, and R, it enables seamless teamwork among data professionals. Key features include auto-scaling clusters, end-to-end data lineage, and native integration with AWS, Azure, and GCP. Databricks streamlines data workflows and ensures enterprise-grade security, governance, and performance, making it a powerful solution for advanced analytics at scale.
Why Move Data from Mimer SQL into Databricks
Unlocking Analytics: The Benefits of Copying Data from Mimer SQL to Databricks
A data engineer or architect may wish to copy data from Mimer SQL into Databricks for several compelling reasons. Firstly, the data residing in Mimer SQL may hold significant analytical or operational value, particularly when combined with information from other sources across the enterprise. Integrating this data within Databricks’s unified analytics environment enables the creation of richer, more comprehensive datasets, thereby enhancing insights and supporting advanced analytics and machine learning projects. Furthermore, executing the data integration and processing within Databricks mitigates the risk of overloading the Mimer SQL system, as the bulk of computational and transformation work occurs on Databricks’ scalable platform. This approach helps maintain the performance and responsiveness of the original Mimer SQL environment while unlocking the full value of its data.
Similar connectors
Creating an Identity in Mimer SQL
In Mimer SQL, a "user" is commonly referred to as an identity. To allow a new person or process to connect to the database, you must create a new identity and, optionally, assign privileges.
Below are instructions for creating an identity in Mimer SQL.
Prerequisites
- You need to be connected to the Mimer SQL database with an account that has system administrator privileges.
- You can execute the commands using the
BSQL
command-line client or any compatible SQL interface.
1. Creating a New Identity
To create a new identity, use the
CREATE IDENTstatement. Replace
new_userand
user_passwordwith the desired username and password:
CREATE IDENT new_user AS USER USING 'user_password';
Example
CREATE IDENT johndoe AS USER USING 'strongP@ssw0rd';
2. Granting Privileges (Optional)
After creating the identity, you typically grant privileges so the user can perform necessary operations in the database. For example, to grant access to all tables in the schema
myschema:
GRANT SELECT, INSERT, UPDATE, DELETE ON myschema.* TO johndoe;
Or to allow general connect privilege:
GRANT USAGE TO johndoe;
3. For System Administrators (Optional)
To create a system administrator identity, add the
AS SYSTEMkeyword:
CREATE IDENT admin_user AS SYSTEM USING 'admin_password';
Note: For security, always choose a strong password and review Mimer SQL’s user and privilege documentation for advanced options such as role management and password policy enforcement.
Installing the JDBC driver
At the time of writing, the Mimer SQL JDBC driver is not included by default with Matillion Data Productivity Cloud. This omission is due to licensing or redistribution restrictions associated with the driver. To use Mimer SQL with Matillion, the appropriate JDBC driver must be downloaded and installed manually.
Downloading the Mimer SQL JDBC Driver
- Navigate to the official Mimer SQL database downloads page: https://www.mimer.com/products/database/
- On the downloads page, look for the JDBC driver package. Mimer provides multiple driver types; ensure you select the Type 4 JDBC driver as it is the recommended pure Java solution suitable for Matillion.
- Download the driver archive to a location accessible from your administrative workstation.
Uploading and Installing the JDBC Driver in Matillion
Once you have obtained the driver:
- Access your Matillion Data Productivity Cloud instance.
- Follow the official instructions for uploading external drivers, documented here: Uploading External Drivers to Matillion.
- Review prerequisites, such as accepted file formats and naming conventions.
- Use the web interface or agent as directed to upload the
.jar
file package for the Mimer JDBC Type 4 driver.
- Verify that the driver appears in your Matillion environment and is selectable when configuring database sources and integrations.
Next Steps
You can refer to the official Matillion documentation for detailed guidance on how to create database queries and configure connections using your newly installed JDBC driver: Database Query Usage Instructions.
Following these steps ensures that you have access to Mimer SQL within Matillion Data Productivity Cloud, even if the driver is not included out of the box.
Checking network connectivity
To ensure successful integration between Matillion Data Productivity Cloud and your Mimer SQL database, you must allow incoming network connections to the database from the relevant Matillion agents. The configuration you require will depend on your deployment mode:
-
Full SaaS Agent Configuration:
If you are using a Full SaaS agent, configure your Mimer SQL database firewall or network rules to allow incoming connections from the IP addresses specified at the following location:
Matillion Data Productivity Cloud – Allowing IP Addresses
Only connections from these addresses should be permitted for security reasons. -
Hybrid SaaS Deployment:
For Hybrid SaaS deployments, the Matillion agent will initiate connections from within your own virtual private cloud (VPC). Therefore, ensure your Mimer SQL database allows incoming connections from your VPC address range. If you need to verify connectivity or assist with troubleshooting, refer to the utilities available here:
Matillion Exchange – Check Network Access
Additionally, if the Mimer SQL database’s hostname is referenced using DNS (rather than a direct IP address), the chosen Matillion agent (whether Full SaaS or Hybrid SaaS) must be able to resolve the database address. Verify that any required DNS resolution is possible from the agent’s network environment.
Querying Data from a Mimer SQL Database
Example Mimer SQL Queries
To interact with data stored in a Mimer SQL database, you primarily use standard SQL
SELECTstatements. Here are a few examples:
Selecting all columns from a table:
SELECT * FROM employees;
Selecting specific columns:
SELECT id, first_name, department FROM employees;
Filtering rows with conditions:
SELECT * FROM employees WHERE department = 'Sales';
Ordering the results:
SELECT id, last_name FROM employees ORDER BY hired_date DESC;
Joining tables:
SELECT e.id, e.first_name, d.department_name FROM employees AS e JOIN departments AS d ON e.department_id = d.id;
Datatype Conversion Considerations
When moving data between Mimer SQL and a platform like Databricks, be aware that some datatype conversion may occur. Both systems support standard SQL types (e.g.,
INTEGER,
VARCHAR,
DATE), but always check for compatibility and apply explicit casts if necessary. For example:
SELECT CAST(hired_date AS VARCHAR(25)) AS hired_date_str FROM employees;
Initial Load and Incremental Load Patterns
The most reliable and efficient pattern for loading data from a Mimer SQL database is to perform a once-off initial load, followed by periodic incremental loads. This technique minimizes data transfer volume and supports efficient data synchronization with systems such as Databricks.
Use the same Database Query component for both initial and incremental loading, with variations in the filtering logic:
Initial Load (Full Extract)
- No filter is applied; all data is transferred.
- Example:
SELECT * FROM employees;
Incremental Load (Delta Extract)
- A filter is used to select only records modified or inserted since the last load—usually using a timestamp or an incrementing, unique column.
- Example (using a
last_updated
timestamp):
SELECT * FROM employees WHERE last_updated > '2024-06-01 00:00:00';
Note: Refer to Matillion’s Incremental Load Data Replication Strategy for best practices on managing and automating incremental loads across ETL workflows.
By following this approach, you ensure robust, maintainable data integration workflows between Mimer SQL and your downstream platform.
Data Integration Architecture
Loading data into Databricks before carrying out integration is an effective "divide and conquer" strategy that separates the loading phase from the integration and transformation phases. This approach is a key advantage of the ELT (Extract, Load, Transform) architecture, as it allows organizations to first ingest raw data at scale, and then focus on transforming and integrating this data as needed. Critical to this process are data transformation pipelines, which provide a robust and maintainable way to carry out required transformations as part of the integration workflow. Another significant advantage of the ELT architecture is that both data transformation and integration are performed directly inside the target Databricks database. As a result, these operations are optimized for speed, can be triggered on-demand, and benefit from the scalable Databricks compute environment. This reduces costs by eliminating the need for separate data processing infrastructure outside of Databricks.