Extracting data from Raima Database Manager (RDM) to Amazon Redshift
Extracting data from Raima Database Manager (RDM) is an essential step for organizations wishing to leverage the analytical power of Amazon Redshift. Whether your goal is centralized reporting, advanced analytics, or simply data modernization, a well-designed extraction and loading process can ensure seamless integration between these two platforms. This article will guide you through each major stage of the process. We will begin by explaining how to create an identity within RDM, ensuring proper authentication and access management. For those using Matillion, we will discuss how to check for—and, if necessary, acquire—the appropriate JDBC driver to facilitate connectivity. Next, we will review the critical aspects of network connectivity and security between your RDM source and Redshift target environments. Finally, we will cover the practicalities of querying and extracting data, both for an initial data load and for subsequent incremental updates. By following this guide, readers will gain a comprehensive understanding of the tools and processes required to efficiently extract data from Raima Database Manager and load it into Amazon Redshift.
What is Raima Database Manager (RDM)?
Raima Database Manager (RDM) is a high-performance, embedded database engine optimized for real-time, resource-constrained environments like industrial automation, telecommunications, and IoT. Supporting both relational and network models, RDM combines data integrity with fast, flexible navigation. Written in portable C, it runs on diverse OSs and hardware. Features include ACID transactions, multi-user access, a flexible API, and compact footprint, making it ideal for applications needing reliable, local data management without the bulk of traditional client-server databases. Cross-platform compatibility and minimal overhead further contribute to its popularity in systems demanding predictable performance and efficiency.
What is Amazon Redshift?
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud, designed to enable fast and complex querying of large datasets using standard SQL. Built on top of PostgreSQL, Redshift optimizes analytical workloads through columnar storage, data compression, and massively parallel processing (MPP) architecture. It integrates seamlessly with AWS services, supports high availability through automatic replication, and offers features such as Redshift Spectrum for direct querying of data in Amazon S3. With automatic scaling, workload management, and a pay-as-you-go pricing model, Redshift facilitates efficient, scalable, and secure data warehousing for a wide range of enterprise use cases.
Why Move Data from Raima Database Manager (RDM) into Amazon Redshift
Unlocking Analytics: Copying Data from Raima Database Manager (RDM) to Redshift
There are several compelling reasons why a data engineer or architect might choose to copy data from Raima Database Manager (RDM) into Amazon Redshift. First, RDM often contains operational data that can be of significant value when analyzed or combined with data from other business systems. However, the true potential of this data is realized when it is integrated with information from diverse sources, such as customer management systems, analytics platforms, and external data feeds. By importing RDM data into Redshift—a scalable, cloud-based data warehouse—organizations can perform complex analyses and reporting tasks across unified datasets. Importantly, leveraging Redshift for these purposes helps to avoid imposing additional processing loads on the primary RDM environment, thereby preserving its performance for transactional or embedded applications while still unlocking the value of the data through powerful analytics and integration capabilities.
Similar connectors
Creating a User in Raima Database Manager (RDM)
To manage access to your Raima Database Manager (RDM) database, you can create users with specific authentication credentials. User and authentication management is supported via SQL commands. Below are step-by-step instructions and an example SQL script for creating a new user identity in an RDM database.
Prerequisites
- You must have a database created and running.
- You need to be connected to RDM with a user account that has sufficient privileges (typically an administrative user).
Instructions
-
Connect to the RDM SQL engine using the provided client (such as
sqlcli
or another interface compatible with RDM SQL). -
Create a user by executing the
CREATE USER
SQL statement. Specify the username and the corresponding password.
CREATE USER username IDENTIFIED BY 'password';
- Replace
username
with the desired username (e.g.,appuser
). -
Replace
'password'
with a secure password of your choice. -
Grant privileges if the user needs access to database objects or specific actions, for example:
GRANT SELECT, INSERT ON mytable TO username;
- Adjust the privileges (
SELECT
,INSERT
, etc.) and table name (mytable
) as required.
Example
Assume you want to create a user called
appuserwith password
StrongP@ssw0rd:
CREATE USER appuser IDENTIFIED BY 'StrongP@ssw0rd';
Optionally, to allow
appuserto read and insert data on a table named
customer:
GRANT SELECT, INSERT ON customer TO appuser;
Tips
- Passwords are case sensitive and should follow your organization's security policies.
- You may need to commit the transaction to apply changes, depending on your session settings:
COMMIT;
- Review RDM documentation for details on available user management and privilege commands, as implementations may vary slightly between versions.
Installing the JDBC Driver
The Raima Database Manager (RDM) JDBC driver enables connectivity between the RDM database and third-party tools, such as Matillion Data Productivity Cloud. Please note that, at the time of writing, the RDM JDBC driver is not bundled with Matillion Data Productivity Cloud because of specific licensing or redistribution restrictions. Therefore, the driver must be downloaded and installed manually.
To successfully enable integration, follow the steps outlined below:
1. Download the JDBC Driver
- Visit the official Raima Database Manager product page at https://www.raima.com/products/rdm.
- On the downloads section of the page, locate the JDBC driver suitable for your platform.
- Preferably, select the Type 4 JDBC driver. Type 4 drivers are implemented in pure Java and are typically the most compatible option when integrating with cloud-based services such as Matillion Data Productivity Cloud.
2. Review the Licensing Terms
Before proceeding, ensure you have reviewed and complied with any licensing requirements associated with the RDM JDBC driver. Due to redistribution or licensing restrictions, distribution or automated download may not be permitted, which is why this manual step is required.
3. Install the JDBC Driver in Matillion Data Productivity Cloud
- Once you have obtained the driver JAR file, follow the official Matillion instructions for uploading external JDBC drivers:
- Refer to the guide at https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/.
- This process generally involves accessing your Matillion environment, navigating to the agent or runtime environment, and uploading the JDBC JAR file to the correct directory.
4. Configure and Use the Driver
- After the JDBC driver is installed, you can configure connections to your RDM database via Matillion Data Productivity Cloud.
- Detailed setup and usage instructions can be found in the official Matillion documentation: https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/.
- This guide covers how to create data source configurations and leverage the JDBC driver in your workflows.
Checking network connectivity
To enable connectivity between Matillion Data Productivity Cloud and your Raima Database Manager (RDM) database, you must ensure that the RDM database permits incoming network connections as follows:
-
Full SaaS agent configuration:
The RDM database server must allow incoming connections from the IP addresses used by Matillion Data Productivity Cloud’s SaaS agents. The current list of required IP ranges is documented at:
https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/ -
Hybrid SaaS deployment:
If you are using a Hybrid SaaS deployment, configure the RDM database to allow incoming connections from your own Virtual Private Cloud (VPC) or similar network environment. To verify connectivity and for additional tools, you can access network utilities at:
https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/
Additionally, if you are referencing the RDM database using a DNS hostname, the Full SaaS or Hybrid SaaS agent must be able to resolve this address. Ensure that the relevant DNS records are accessible from the network where your selected deployment is running.
Querying Data from a Raima Database Manager (RDM) Database
This guide explains how to query data from a Raima Database Manager (RDM) database, with practical SQL examples, notes on datatype conversion (particularly in Redshift integrations), and effective strategies for both initial and incremental data load processes.
1. Example RDM SQL SELECT Statements
RDM supports a subset of the SQL standard for querying data. Here are a few examples:
``` -- Selecting all columns from a table called "customers" SELECT * FROM customers;
-- Selecting specific columns SELECT customer_id, first_name, last_name FROM customers;
-- Applying a WHERE filter SELECT * FROM orders WHERE order_date >= '2024-01-01';
-- Using ORDER BY and LIMIT (if supported by your RDM version) SELECT * FROM products ORDER BY price DESC LIMIT 10; ```
Note: RDM’s SQL dialect may not support all standard SQL functions. Refer to your RDM documentation for precise syntax and supported functions/operators.
2. Datatype Conversion: RDM to Redshift
When integrating RDM with Amazon Redshift or other analytics platforms, datatype conversion is required, as not all RDM datatypes have a direct counterpart in Redshift:
| RDM Datatype | Recommended Redshift Datatype |
|---|---|
| CHAR(n), VARCHAR | VARCHAR(n) |
| SMALLINT | SMALLINT |
| INTEGER | INTEGER |
| FLOAT, DOUBLE | FLOAT8 |
| DATE, TIME | TIMESTAMP |
| BINARY, BLOB | VARBYTE |
Tip: Always validate your ETL process to ensure correct casting and rounding, particularly for dates and floating-point types.
3. Loading Patterns: Initial and Incremental Loads
The recommended data integration pattern is:
- Initial Load: Perform a once-off bulk import of all available data from RDM.
- Incremental Loads: Regularly transfer only new or updated rows.
The same Database Query component can be reused for both, with a key difference: the use of a
WHEREfilter for incremental loads.
a. Initial Load Query
- No additional filter is required; fetch all rows.
SELECT * FROM customers;
Or, for specific columns:
SELECT customer_id, first_name, last_name, last_modified FROM customers;
b. Incremental Load Query
- Use a
WHERE
clause to fetch only rows added/changed since the last extraction, often based on a timestamp or auto-incrementing key.
SELECT * FROM customers WHERE last_modified > '2024-06-01 00:00:00';
This incremental loading strategy minimizes data movement and improves performance.
Read more: Incremental-Load Data Replication Strategy (Matillion Exchange)
4. Using the Database Query Component
- Initial Load: Leave the filter blank in the Database Query component.
- Incremental Load: Configure the component with an appropriate filter clause, usually referencing a stored "last loaded" value.
```ell
Pseudo-configuration example:
query: "SELECT * FROM customers WHERE last_modified > ?" ```
Replace the question mark with a parameter that is set to your last successful load's timestamp.
By following this pattern, you can efficiently query and synchronize data between RDM and analytics destinations like Redshift.
Data Integration Architecture
Loading data into Amazon Redshift prior to integrating it is an effective way to address data integration challenges by breaking the task into two sequential steps. This approach, a hallmark of the ELT (Extract, Load, Transform) architecture, enables organizations to "divide and conquer" by first loading raw data efficiently, and then performing integration as a separate process. Data integration inherently involves data transformation, which is best handled using data transformation pipelines that facilitate consistency, manageability, and automation of complex transformations. A further significant advantage of the ELT architecture is that data transformation and integration tasks occur directly within the target Redshift database itself. This in-database processing delivers high performance due to parallel execution, enables on-demand scalability, and eliminates the need and expense of separate data processing infrastructure, as all transformations are executed elastically within Redshift’s managed environment.