Extracting data from QuestDB to Amazon Redshift
Extracting data from QuestDB is a crucial step for organizations seeking to integrate real-time analytics and time-series data into their broader data ecosystems. Amazon Redshift, with its robust data warehousing capabilities, is a popular destination for such data, enabling advanced analysis and reporting across a wide range of use cases. This article provides a step-by-step guide on moving data from QuestDB to Redshift. We’ll begin by covering how to create an identity within QuestDB, ensuring secure and consistent access. For users leveraging Matillion for ETL jobs, we will review how to verify and if necessary, obtain the appropriate JDBC driver for QuestDB connectivity. We’ll also address the key consideration of network connectivity between source and target environments to ensure reliable data transfer. Finally, the article will demonstrate how to query data from QuestDB—both for an initial bulk load and for ongoing incremental updates—laying the foundation for an effective and repeatable data extraction workflow.
What is QuestDB?
QuestDB is a high-performance, open-source time series database designed for demanding real-time analytics on large volumes of time-stamped data. Written in Java, it employs a column-oriented storage engine and leverages memory-mapped I/O to enable ultrafast ingestion rates—often exceeding millions of records per second—while supporting full SQL syntax including JOINs and complex aggregations. QuestDB features native support for popular protocols such as InfluxDB line protocol, PostgreSQL wire protocol, and REST, facilitating integration with a wide variety of data sources and visualization tools. Its zero-GC architecture and vectorized query execution provide low latency and efficient resource utilization, making it ideal for use cases spanning financial services, IoT, and observability.
What is Amazon Redshift?
Amazon Redshift is a fully managed, cloud-based data warehouse designed for OLAP workloads. Based on PostgreSQL, it adds columnar storage, parallel query execution, and advanced compression for fast complex queries on large datasets. Redshift integrates with AWS services like S3 and Glue, and business intelligence tools. Clusters are easily scalable for both performance and cost. Features like concurrency scaling and Spectrum allow dynamic resources and querying of exabyte-scale S3 data. Security is strong, with encryption at rest and in transit, VPC network isolation, and detailed identity and access controls.
Why Move Data from QuestDB into Amazon Redshift
Unlocking Advanced Analytics: Copying Data from QuestDB to Redshift
A data engineer or architect may seek to copy data from QuestDB to Amazon Redshift for several compelling reasons. First, QuestDB often stores time-series data or other valuable datasets that can provide significant business insights when analyzed appropriately. However, the true value of this data is often realized when it is integrated with information from additional sources, enabling more comprehensive analytics and richer decision-making. By offloading the integration and complex analytic workloads onto Redshift—a robust, scalable data warehouse—the operational burden on QuestDB is minimized. This ensures that QuestDB can maintain its performance for data ingestion and real-time querying, while Redshift handles heavy data transformations, complex joins, and large-scale analysis, facilitating efficient and effective utilization of all available data.
Similar connectors
Creating a User Identity in QuestDB
QuestDB, as of version 7.x (and current preview versions), does not support granular, SQL-based user management such as
CREATE USERor
GRANT. QuestDB uses a simplified security model centered around database instance access and, optionally, authentication via HTTP or PostgreSQL wire protocol.
Below are instructions to secure access and, where relevant, configure authentication for different interfaces.
1. HTTP Authentication (REST and Web Console)
You can enable HTTP Basic Authentication for the web console and REST API endpoints.
Steps:
-
Edit the configuration file (
conf/server.conf
or via environment variables). -
Set HTTP authentication to true:
ini http.security.readonly=false -
Set the authentication type to Basic:
ini http.security.authentication=basic -
Define the username and password:
You can supply these via environment variables or configuration file keys:
ini
http.security.username=myuser
http.security.password=mypassword
In a Docker environment, use (for example):
bash
-e QDB_HTTP_SECURITY_AUTHENTICATION=basic
-e QDB_HTTP_SECURITY_USERNAME=myuser
-e QDB_HTTP_SECURITY_PASSWORD=mypassword
Result: The HTTP endpoints now require authentication using these credentials.
2. PostgreSQL Wire Protocol Authentication
QuestDB supports connecting via the PostgreSQL protocol. You can enforce user authentication for this interface.
Steps:
- Edit the configuration file or use environment variables:
ini
pg.enabled=true
pg.security.authentication=true
pg.security.username=myuser
pg.security.password=mypassword
-
Optionally, set the binding address and port:
ini pg.factory.port=8812 -
Restart QuestDB so configuration changes take effect.
Result:
Attempting to connect with the PostgreSQL wire protocol (
psql, JDBC, etc.) will now require the supplied username and password.
3. Access Control Notes
- At present, all configured users have the same level of privileges (full database access).
- There is no support for per-table, per-database, or schema-level authorization.
- Disabling or securing the database at the network level (firewall, container networks) is recommended for further security.
Example: Environment Variable Setup (Docker)
bash docker run -p 9000:9000 -p 8812:8812 \ -e QDB_HTTP_SECURITY_AUTHENTICATION=basic \ -e QDB_HTTP_SECURITY_USERNAME=admin \ -e QDB_HTTP_SECURITY_PASSWORD=secretpassword \ -e QDB_PG_SECURITY_AUTHENTICATION=true \ -e QDB_PG_SECURITY_USERNAME=admin \ -e QDB_PG_SECURITY_PASSWORD=secretpassword \ questdb/questdb
Reference
See the QuestDB configuration documentation for details and further options regarding authentication and security.
Installing the JDBC Driver
At the time of writing, the QuestDB JDBC driver is not bundled by default with Matillion Data Productivity Cloud. This is mainly due to licensing or redistribution restrictions, so you must manually obtain and install the required driver before connecting Matillion to your QuestDB instance.
1. Download the QuestDB JDBC Driver
Visit the official QuestDB documentation at https://questdb.io/docs/reference/api/jdbc/ to locate and download the JDBC driver. When available, select the Type 4 JDBC driver as it is the most suitable for modern environments and does not require any client-side libraries apart from the driver itself.
2. Upload the JDBC Driver to Matillion Data Productivity Cloud
To make the driver available to your Matillion environment, you need to upload the downloaded
.jarfile using the agent interface. Matillion provides full step-by-step instructions on how to upload and configure third-party JDBC drivers at this URL:
https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/
3. Configure and Use the Driver
Once the driver has been uploaded and Matillion has been configured to recognize it, you can use the driver to create connections to your QuestDB instance for querying and data actions. Guidance on how to set up and use JDBC drivers within Matillion Data Productivity Cloud can be found here:
https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/
Refer to both the QuestDB and Matillion documentation for the most up-to-date compatibility advice and further configuration options.
Checking network connectivity
To connect Matillion Data Productivity Cloud to your QuestDB database, you must ensure that QuestDB allows incoming network connections from the appropriate sources, depending on your deployment configuration:
-
Full SaaS Agent Configuration:
Allow incoming connections to your QuestDB instance from the set of IP addresses Matillion uses for outbound traffic. These IP addresses are documented and kept up to date at the following location: https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/. -
Hybrid SaaS Deployment:
Allow incoming connections from your own virtual private cloud (VPC) in which your Hybrid SaaS agent is running. You can use the network test utilities provided by Matillion to check and troubleshoot connectivity between your VPC and QuestDB: https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/.
Additionally, if you are using a DNS hostname to reference your QuestDB database (rather than a raw IP address), ensure that the Full SaaS agent or your Hybrid SaaS agent can resolve the DNS address. If DNS resolution fails, the Matillion service will not be able to establish a connection to your QuestDB instance.
Querying Data from QuestDB
This guide explains how to query data from a QuestDB database using SQL, addresses type conversion considerations between QuestDB and Redshift, and demonstrates recommended patterns for loading data.
Basic SQL SELECT Queries in QuestDB
QuestDB supports SQL queries familiar to users of relational databases. Here are some query examples:
``` -- Select all columns and rows from a table SELECT * FROM trades;
-- Select specific columns SELECT timestamp, price, volume FROM trades;
-- Time-based filtering with QuestDB's timestamp syntax SELECT * FROM trades WHERE timestamp >= '2024-06-01T00:00:00Z';
-- Aggregate data SELECT symbol, COUNT(*), AVG(price) FROM trades GROUP BY symbol; ```
Handling Datatype Conversion (QuestDB ↔ Redshift)
When moving data between QuestDB and Amazon Redshift, you may encounter datatype differences. For example:
| QuestDB Type | Closest Redshift Type |
|---|---|
LONG |
BIGINT |
DATE |
DATE |
TIMESTAMP |
TIMESTAMPTZ |
SYMBOL, STRING |
VARCHAR |
DOUBLE |
DOUBLE PRECISION |
BOOLEAN |
BOOLEAN |
Ensure that mappings are explicitly handled in your ETL or querying framework to prevent data mismatches or truncation.
Best Pattern: Initial Load and Incremental Loads
To efficiently replicate or migrate data from QuestDB to Redshift (or any destination), follow a two-phase strategy:
- Initial Load: Extract all data.
- Incremental Loads: Periodically extract new or updated records.
Both use the same Database Query component, such as in ETL tools (e.g., Matillion).
Initial Load Example
Perform the initial load without any filter clause:
SELECT * FROM trades;
This statement fetches the entire dataset for the first ingestion.
Incremental Load Example
Subsequent loads should filter data based on a watermarked column, typically a timestamp or an incrementing ID:
SELECT * FROM trades WHERE timestamp > '2024-06-10T00:00:00Z';
In the above query, replace
'2024-06-10T00:00:00Z'with the last-loaded timestamp value.
Tip: For a deeper understanding of incremental load strategies with ETL tools, see Incremental Load: Data Replication Strategy (Matillion).
With this approach, you can efficiently manage and query time-series or tabular data from QuestDB databases, supporting both bulk transfers and low-latency incremental updates to downstream systems like Redshift.
Data Integration Architecture
Loading data in advance of integration, as practiced in the ELT (Extract, Load, Transform) architecture, offers a strategic "divide and conquer" approach by separating the process into two distinct steps: loading data into the target system, and then transforming and integrating it within that environment. Data integration ultimately relies on robust data transformation, and the most effective means to achieve this is through dedicated data transformation pipelines. One of the major advantages of the ELT architecture is that both transformation and integration activities are executed directly inside the target Redshift database. This not only accelerates processing and enables on-demand workflows, but also scales efficiently with Redshift’s compute resources—removing the need to provision or pay for additional external data processing infrastructure.