Extracting data from Raima Database Manager (RDM) to Snowflake
Extracting data from Raima Database Manager (RDM) is a crucial task for organizations seeking to leverage advanced analytics and scalable storage in cloud data platforms such as Snowflake. Effective data integration enables businesses to unlock the full potential of their operational data by making it readily available for reporting, analysis, and business intelligence. In this article, we will guide you through the essential steps for moving data from RDM to Snowflake. We will begin by detailing how to create an identity in RDM—an important prerequisite for secure access. If you are a Matillion user, we will discuss how to check for or acquire the necessary JDBC driver to facilitate the connection. Next, we will explore the considerations and steps involved in establishing reliable network connectivity between your source (RDM) and target (Snowflake) environments. Finally, we will outline approaches for querying and extracting data, addressing both the initial transfer and ongoing incremental loads to ensure your data in Snowflake remains current and actionable. By following this guide, you will gain a clear understanding of how to set up a robust and efficient data pipeline from RDM to Snowflake, tailored for both manual and Matillion-integrated workflows.
What is Raima Database Manager (RDM)?
Raima Database Manager (RDM) is an embedded, high-performance database with a small footprint, ideal for real-time and resource-limited settings like IoT devices, industrial automation, and edge computing. RDM supports SQL, ACID transactions, and a navigational API for direct access. Optimized for reliability and determinism, it features multi-threading, built-in replication, and flexible storage—both in-memory and on-disk. RDM operates standalone or embedded within applications and supports multiple programming languages, making it suitable for diverse embedded and enterprise applications.
What is Snowflake?
Snowflake is a cloud-native data platform that provides a highly scalable, elastic, and fully managed data warehouse solution, enabling organizations to store, process, and analyze large volumes of structured and semi-structured data. It separates compute and storage resources, allowing workloads to scale independently and run concurrently without performance contention. Snowflake also supports ANSI SQL, native support for JSON, Avro, Parquet, and other data formats, and offers powerful features such as secure data sharing, automatic clustering, zero-copy clones, and multi-cloud deployment across AWS, Azure, and Google Cloud. Its architecture abstracts infrastructure management, simplifies scaling, and facilitates advanced analytics, making it a popular choice for modern data engineering and analytics use cases.
Why Move Data from Raima Database Manager (RDM) into Snowflake
Leveraging Raima Database Manager Data in Snowflake for Enhanced Analytics
A data engineer or architect may wish to copy data from Raima Database Manager (RDM) into Snowflake to unlock the full potential of the valuable data stored in RDM. While RDM efficiently manages and stores operational or embedded data, its true value emerges when this data is integrated with information from other sources, enabling richer analytics and comprehensive business insights. By transferring RDM data into Snowflake, organizations can leverage Snowflake’s scalable architecture and robust data integration capabilities without imposing additional workload on the operational RDM environment. This approach ensures that analytical and reporting processes do not interfere with the performance and reliability of critical transactional systems powered by RDM.
Similar connectors
Creating a User in Raima Database Manager (RDM)
Raima Database Manager (RDM) provides user and security management features starting from RDM 14.1, where database identities (users) can be created and managed. Users are associated with roles to control their access and permissions within a database. The following instructions guide you through the process of creating a new user (identity) in an RDM database using SQL.
Prerequisites
- RDM 14.1 or later.
- Sufficient privileges to create users (requires the
SECURITY_ADMIN
role). - Access to RDM SQL Command Line or a similar SQL client.
1. Connect to the Database
-- Example: Connect as an existing admin user CONNECT TO 'your_database'; -- replace with your database name AUTHORIZATION 'admin_username', 'admin_password';
2. Create a New User (Identity)
RDM uses the
CREATE IDENTITYstatement to create users. You must assign a username and password, and can (optionally) grant roles.
-- Syntax: Create a user (identity) and assign a password
CREATE IDENTITY username
AUTHENTICATED BY 'user_password';
Example:
CREATE IDENTITY alice
AUTHENTICATED BY 'Secur3Passw0rd!';
3. Assign a Role to the User
To grant permissions, you assign one or more roles to the new user. The built-in role
PUBLICis always granted; additional roles (e.g.,
DBA,
READONLY,
SECURITY_ADMIN) can be granted using
GRANT.
-- Grant DBA role to the new user GRANT DBA TO alice;
You can also create custom roles as needed and assign them to the user.
4. Confirm User Creation (Optional)
You can view all created identities using:
-- List all users (identities) SELECT * FROM rdm_sys.identities;
Note: Usernames and passwords are case-sensitive. Ensure that you replace
username,
user_password, and role names with appropriate values relevant to your system and security requirements.
Installing the JDBC driver
At the time of writing, the Raima Database Manager (RDM) JDBC driver is not bundled with Matillion Data Productivity Cloud by default. This is due to licensing or redistribution restrictions imposed by Raima. As such, if you require integration with an RDM database, you will need to obtain and install the JDBC driver manually.
Download the JDBC driver
To get started, you should download the appropriate JDBC driver from the official Raima website:
When selecting a driver, prioritize a Type 4 JDBC driver (commonly referred to as a “thin” driver), as this option is generally preferred for use with Matillion, offering better compatibility and performance in cloud-based environments.
Install the JDBC driver in Matillion Data Productivity Cloud
Once you have downloaded the correct JDBC driver JAR file, you need to upload it into your Matillion environment so that it can be used in data integrations and transformations. Matillion provides a documented process for uploading third-party (external) JDBC drivers:
- Please follow the steps provided in the Matillion documentation at
https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/
This documentation explains how to add the JAR file to your Matillion Agent and ensure that it is recognized and available for use in orchestrations and data pipelines.
Configuring database queries
After installing the driver, you can use it within your data workflows in Matillion to perform operations against your RDM database. For details on configuring database queries and using custom drivers within Matillion, refer to this resource:
These instructions explain how to specify connection parameters and utilize the external JDBC driver within the Database Query component of your data pipelines.
Make sure that configuration values such as hostname, port, database, and credentials are appropriate for your RDM instance and network environment.
By following these steps, you will be able to securely add RDM connectivity to your Matillion Data Productivity Cloud workflows.
Checking network connectivity
When connecting Matillion Data Productivity Cloud to a Raima Database Manager (RDM) database, you must ensure that the RDM database is configured to allow incoming network connections appropriate to your deployment type:
-
Full SaaS Agent Configuration:
The RDM database must permit incoming connections from the set of IP addresses used by Matillion’s SaaS agents. The required IPs are documented at Allowing IP Addresses. Ensure your database firewall or access control lists are updated accordingly. -
Hybrid SaaS Deployment:
If you are using a Hybrid SaaS deployment, configure your RDM database to allow incoming connections from the IP address range of your own virtual private cloud (VPC). You can use the tools available at Check Network Access to verify connectivity and troubleshoot access issues.
Additionally, if you are connecting to the Raima Database Manager (RDM) database using a DNS name instead of an IP address, the resolving agent (whether Full SaaS or Hybrid SaaS) must be able to resolve the DNS name properly. Ensure any relevant DNS records are publicly accessible or correctly configured within your private network to support this requirement.
Querying Data from Raima Database Manager (RDM)
This guide outlines how to query data from a Raima Database Manager (RDM) database, focusing on sample SQL statements, type conversion considerations with Snowflake, and recommended patterns for initial and incremental data loads.
1. Example RDM SQL SELECT Statements
Use standard SQL to query data from RDM tables. Common examples:
``` -- Select all rows from 'customers' SELECT * FROM customers;
-- Select specific columns from 'orders' SELECT order_id, customer_id, order_date FROM orders;
-- Aggregate sales by product SELECT product_id, SUM(quantity) AS total_sold FROM order_items GROUP BY product_id;
-- Filtered query with WHERE clause SELECT * FROM products WHERE discontinued = 0; ```
2. Considerations for Datatype Conversion (RDM ⇔ Snowflake)
When extracting data from RDM into Snowflake, datatype conversion is likely. Notable points:
- Integer types (e.g.,
SMALLINT
,INT
,BIGINT
) map directly, though sizes may differ. - Floating-point types (
FLOAT
,DOUBLE
) will map, but differences in precision (‘decimals’) can occur. - Character/string types (
CHAR
,VARCHAR
) are generally compatible, but be aware of maximum length limitations. - Date/time types (
DATE
,TIME
,TIMESTAMP
) are supported, but always check for timezone and formatting mismatches. - Nullability, encoding, and other column constraints may need additional handling.
3. Pattern: Initial and Incremental Loads
The optimal pattern for extracting data from RDM uses both initial (full) and incremental (delta) loads. Reference: Matillion Incremental Load and Data Replication Strategy.
- Initial Load: Extracts the entire contents of a table for the first time.
- Incremental Load: Extracts only data that has been added or modified since the last load, minimizing data transfer.
Both approaches can use the same Database Query component (e.g., Matillion's Database Query), with differences in filtering logic:
Initial Load Example (No Filter Clause)
SELECT * FROM orders;Use when loading all data for the first time.
Incremental Load Example (With Filter Clause)
SELECT * FROM orders WHERE updated_at > '2024-06-10 00:00:00';Update the filter value according to the last known load point (e.g., use the
updated_attimestamp or another change-tracking column).
- Ensure a reliable mechanism (such as high water-mark timestamps or numeric keys) for tracking and updating the filter value after each incremental load.
Tip: Always test data extraction queries after datatype conversion to handle any incompatibilities between RDM and Snowflake.
Data Integration Architecture
Loading data in advance of its integration is a fundamental principle of the Extract, Load, and Transform (ELT) architecture. This approach effectively divides the data integration process into two distinct stages: first, raw data is ingested and stored within the Snowflake database; subsequently, the integration and transformation tasks are performed directly on that data. By adopting this model, organizations are able to “divide and conquer” complex integration workflows, simplifying maintenance and troubleshooting. Data integration inherently involves the transformation of data to fit target models, apply business logic, or standardize formats. The most efficient and reliable way to achieve this is through dedicated data transformation pipelines, which orchestrate, streamline, and automate these processes within Snowflake. An additional advantage of the ELT architecture is that both transformation and integration occur directly inside the Snowflake environment. As a result, organizations benefit from fast, on-demand, and highly scalable computation, while avoiding the complexity and cost associated with external data processing infrastructure.