Extracting data from QuestDB to Snowflake
Extracting data from QuestDB is an essential step for organizations looking to leverage the power and scalability of cloud data warehouses, such as Snowflake. Efficiently moving data from QuestDB into Snowflake enables advanced analytics, broader data integration, and improved business insights. In this article, we will guide you through the critical stages of this process. We will begin by creating an identity in QuestDB to ensure secure and controlled data access. For users employing Matillion, we will explain how to check for, or obtain, the necessary JDBC driver to facilitate connectivity. We will also discuss the requirements and recommendations for establishing robust network connectivity between QuestDB (the source) and Snowflake (the target). Finally, we will outline strategies for querying data, both for an initial full extraction and for subsequent incremental loads to keep Snowflake up-to-date with minimal overhead. By following these steps, you will be able to design and implement a reliable and efficient data pipeline between QuestDB and Snowflake.
What is QuestDB?
QuestDB is an open-source, high-performance time series database optimized for fast data ingestion and analytical queries on structured data. Built in Java with a column-oriented, vectorized engine and memory-mapped files, it handles millions of writes per second and complex SQL analytics. QuestDB supports standard and extended SQL, out-of-order ingest, real-time aggregation, geospatial queries, and multiple ingestion protocols (Postgres wire, HTTP, WebSocket, InfluxDB line). Its lightweight design and operational simplicity make it ideal for both cloud-native deployments and edge scenarios, while broad interoperability ensures compatibility with various monitoring and data pipeline tools.
What is Snowflake?
Snowflake is a cloud-native data platform that separates compute and storage for scalable, high-performance analytics. Its multi-cluster architecture enables concurrent workloads without conflict and supports both structured and semi-structured data (e.g., JSON, Avro, Parquet). Running fully managed on AWS, Azure, and Google Cloud, it offers automatic scaling, secure data sharing, and disaster recovery. With a familiar SQL interface, Snowflake provides advanced features like time travel, zero-copy cloning, and support for external functions, empowering teams to build resilient analytics solutions with minimal maintenance.
Why Move Data from QuestDB into Snowflake
Unlocking Analytics: The Benefits of Copying Data from QuestDB to Snowflake
A data engineer or architect might choose to copy data from QuestDB into Snowflake for several strategic reasons. QuestDB often contains potentially valuable time-series or transactional data, and unlocking its full potential typically involves integrating it with other enterprise data sources, such as customer information, sales records, or external datasets. By moving this data into Snowflake—a powerful, scalable cloud data platform—organizations can perform advanced analytics, business intelligence, and reporting, leveraging the combined value of disparate datasets. Additionally, conducting these data integrations and analyses within Snowflake rather than directly in QuestDB helps prevent additional query or processing workload on QuestDB itself, thereby preserving its performance and reliability for core transactional operations.
Similar connectors
Creating a User in QuestDB
As of QuestDB version 7.3 (and at the knowledge cutoff date of June 2024), QuestDB does not support traditional user management or role-based authentication like PostgreSQL or MySQL. All database interactions are performed as a single built-in user, and QuestDB relies on network-level and operating system-level controls for securing access.
Current State: Authentication and User Management
QuestDB currently provides the following mechanisms for controlling access:
- HTTPS/HTTP Authentication: The Web Console can be configured with basic authentication.
- API Authentication: REST APIs can be protected with HTTP Basic Auth when enabled.
- Network Restrictions: Access can be controlled using firewalls and binding QuestDB to specific network interfaces.
- Docker and OS-level Security: If running QuestDB in Docker or as a system service, standard container or OS-level user management best practices should be applied.
Example: Securing the Web Console
To enable HTTP Basic Authentication for the Web Console:
- Locate your
server.conf
file (typically in the/conf
directory). - Add or modify the following properties:
properties
http.security.readonly=true
http.security.auth.basic=true
http.security.auth.basic.username=admin
http.security.auth.basic.password=YOUR_SECRET_PASSWORD
- Restart QuestDB for the changes to take effect.
Placeholder Example: Creating a User Table (Workaround Approach)
If your requirement is to track application users at the data layer (for example, for auditing or multi-tenancy), you can create a logical "user" table:
CREATE TABLE users (
user_id SYMBOL PRIMARY KEY,
password_hash STRING,
created_at TIMESTAMP
) timestamp(created_at);
You can then add entries using:
INSERT INTO users (user_id, password_hash, created_at)
VALUES ('alice', 'b109f3...', now());
Note: This "user" table does not grant any database-level permissions or access control; it is purely an application-managed table.
For updates on user management and authentication in QuestDB, please refer to the QuestDB documentation.
Installing the JDBC driver
At the time of writing, the JDBC driver for QuestDB is not bundled with Matillion Data Productivity Cloud by default. This is due to licensing or redistribution restrictions. To use QuestDB as a data source or target, you will need to manually download and install the JDBC driver into your Matillion environment.
Step 1: Download the QuestDB JDBC Driver
- Visit the official QuestDB JDBC documentation page: https://questdb.io/docs/reference/api/jdbc/
- Download the latest release of the QuestDB JDBC driver.
- Ensure you select a Type 4 (“pure Java”) JDBC driver for maximum compatibility with cloud and distributed environments.
Step 2: Install the JDBC Driver in Matillion Data Productivity Cloud
- Consult the Matillion documentation for installing external JDBC drivers: Uploading External Drivers
- Follow the documented procedure to upload the QuestDB JDBC
.jar
file using the Matillion Agent interface or the appropriate method for your deployment.- Make sure that the driver file is accessible to the Agent serving your Matillion Data Productivity Cloud jobs.
- Restart your Matillion Agent instance if required by the instructions to ensure the new driver is recognized.
Step 3: Configure and Use the Driver
Once the driver is installed, you can configure and utilize it for database interactions within Matillion:
- Refer to the official usage instructions at Database Query Component - Matillion Designer for setting up your connection to QuestDB, outlining authentication details, connection URLs, and component configuration specific to your use case.
By following these steps, you will enable connectivity between Matillion Data Productivity Cloud and QuestDB, using a compliant and up-to-date JDBC driver.
Checking network connectivity
To establish a reliable connection between Matillion Data Productivity Cloud and your QuestDB database, you must ensure that QuestDB allows incoming connections from the appropriate sources based on your deployment configuration:
-
Full SaaS Agent Configuration:
If you are using the Full SaaS deployment, configure the firewall or access control settings of your QuestDB instance to allow incoming connections from the static IP addresses used by Matillion Cloud Agents. You can find the current list of IP addresses to allow at: https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/. -
Hybrid SaaS Deployment:
For Hybrid SaaS agents, you need to allow incoming traffic from the network or Virtual Private Cloud (VPC) where your Hybrid Agent runs. Often, this will involve knowing the specific public IP range of your VPC or your cloud environment. To assist in verifying network accessibility and identifying which IPs need access, Matillion provides helpful tools here: https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/.
Additionally, if you are specifying your QuestDB database using a DNS hostname (rather than a direct IP address), ensure that the Full SaaS or Hybrid SaaS agent can successfully resolve this hostname to an IP address. This means your DNS must be publicly accessible or, if using private DNS, your configuration must support name resolution from the agent’s network environment.
Querying Data from QuestDB
Below are instructions and examples for querying data from a QuestDB database, along with considerations for integration with Snowflake and implementing effective data load patterns.
Example QuestDB SQL SELECT Statements
QuestDB supports standard ANSI SQL for querying data.
Select all columns from a table:
SELECT * FROM sensors;
Select specific columns:
SELECT sensor_id, temperature, timestamp FROM sensors;
Filtering data (e.g., all records with temperature > 35):
SELECT * FROM sensors WHERE temperature > 35;
Ordering and limiting data:
SELECT * FROM sensors ORDER BY timestamp DESC LIMIT 100;
Aggregating data (e.g., average temperature per sensor):
SELECT sensor_id, AVG(temperature) AS avg_temp FROM sensors GROUP BY sensor_id;
Filtering by timestamp range:
SELECT * FROM sensors WHERE timestamp BETWEEN dateadd('d', -7, now()) AND now();
Datatype Conversion Between QuestDB and Snowflake
When moving data between QuestDB and Snowflake, be aware that datatype conversion may be necessary, as both systems have different supported datatypes. For example: - QuestDB
TIMESTAMPmaps to Snowflake
TIMESTAMP_NTZ. - QuestDB
SYMBOLmay be best mapped to a Snowflake
VARCHAR. - Numeric and string types generally map directly but validate lengths and precision.
Perform appropriate type mapping or casting during the transition, based on your schema and compatibility requirements.
Load Pattern: Initial and Incremental Loads
The recommended pattern for data ingestion involves: 1. A once-off initial load: Replicating all historical data from QuestDB to your destination (e.g., Snowflake). 2. Subsequent incremental loads: Replicating only new or changed data since the last successful load.
Always use the same Database Query component for both load types to simplify maintenance.
Initial Load
- No filter clause on the query; load the full dataset.
Example:
SELECT * FROM sensors;(The query above will extract all rows from the
sensorstable.)
Incremental Load
- Add a filter to select new or updated rows only, usually based on a column such as
timestamp
or an incrementing primary key.
Example:
SELECT * FROM sensors WHERE timestamp > :last_loaded_timestamp;
Here,
:last_loaded_timestampshould be replaced or bound by your job/orchestration to the maximum timestamp in Snowflake after the previous ETL run.
For additional details and best practices on incremental load strategies, see the Matillion Exchange article.
References: - QuestDB SQL Documentation - Matillion Community: Incremental Load Data Replication Strategy
Data Integration Architecture
Loading data in advance of integration is a core strategy in modern data engineering, representing a key advantage of the Extract, Load, and Transform (ELT) architecture. By splitting the process into two distinct steps—first loading raw data into Snowflake, and then integrating and transforming it within the database—organizations effectively divide and conquer the complex problem of data integration. Data integration inherently involves significant data transformation in order to create unified, analytics-ready datasets. Data transformation pipelines are the state-of-the-art solution for performing these changes systematically and reliably. Another major benefit of the ELT model is that all of this transformation and integration activity takes place directly inside the target Snowflake environment. This approach leverages the speed, scalability, and elasticity of Snowflake's cloud-native architecture, allowing transformation tasks to run on demand with no need for dedicated external processing infrastructure, tools, or servers—reducing both complexity and cost.