Integrate data from Vectorwise (Actian Vector) to Snowflake using Matillion

The Actian Vector to Snowflake connector seamlessly transfers your data to Snowflake within minutes, ensuring it remains current without the need for manual coding or intricate ETL procedures.

Actian Vector
Snowflake
Vectorwise (Actian Vector) to Snowflake banner

Extracting data from Vectorwise (Actian Vector) to Snowflake

Extracting data from Actian Vector is a common task for organizations seeking to leverage cloud data platforms such as Snowflake for advanced analytics and increased scalability. However, the process involves several key steps to ensure a smooth and secure data pipeline. This article provides a detailed guide to extracting data from Actian Vector and loading it into Snowflake. We will begin by outlining how to create an appropriate identity in Actian Vector to facilitate access. For Matillion users, we will explain how to check for the presence of the required JDBC driver, or how to acquire it if it is missing. Next, we’ll discuss the vital aspect of establishing network connectivity between your source and target environments. Finally, we’ll explore the strategies for querying data—covering both initial full loads and ongoing incremental extractions—to optimize performance and minimize disruption.


What is Vectorwise (Actian Vector)?

Actian Vector is a high-performance, columnar analytics database designed to accelerate complex query workloads on large datasets. Utilizing vectorized processing, SIMD (Single Instruction, Multiple Data) operations, and advanced data compression, Vector enables in-memory analytical processing with exceptional speed and efficiency. Its architecture allows for seamless integration with industry-standard SQL tools and supports hybrid deployments in on-premises, cloud, and Hadoop environments. Optimized for both ad hoc queries and high-concurrency scenarios, Actian Vector is particularly well-suited for business intelligence, data warehousing, and advanced analytics applications demanding rapid, scalable insights.

matillion logo x Actian Vector

What is Snowflake?

Snowflake is a fully managed, cloud-native data platform built on AWS, Azure, and Google Cloud. With its multi-cluster shared architecture, it decouples storage and compute, offering scalable and concurrent processing for tasks from batch processing to real-time analytics. Supporting both structured and semi-structured data (e.g., JSON, Avro, Parquet), Snowflake enables flexible data analysis via a SQL-based engine. Automated infrastructure, security, and optimization let organizations focus on insights, not management. Its usage-based pricing and secure cross-org data sharing make Snowflake a powerful solution for modern data needs.

Why Move Data from Vectorwise (Actian Vector) into Snowflake

Benefits of Transferring Data from Actian Vector to Snowflake for Advanced Analytics

A data engineer or architect may consider copying data from Actian Vector into Snowflake for several compelling reasons. First and foremost, Actian Vector often contains data that is potentially valuable for analytical and business intelligence purposes, yet may reside siloed or isolated from broader organizational workflows. By integrating this data with information from other sources within a centralized platform like Snowflake, organizations can unlock deeper insights and derive greater value through consolidated analytics and advanced reporting. Additionally, leveraging Snowflake for data integration allows IT teams to perform complex data transformations and analyses without imposing additional workload or performance bottlenecks on the Actian Vector system, preserving its resources for transactional or operational functions. This approach enables efficient use of both systems, supporting scalability and agility in modern data architectures.

Creating a User in Actian Vector Database

To create a user (also referred to as an identity) in an Actian Vector database, you will primarily work with SQL statements executed from a suitable SQL client (such as

sql
or DB access tools). The following instructions outline the standard steps for creating and configuring a new database user in Actian Vector.

Prerequisites

  • You must have administrative privileges or be connected as a user with sufficient rights to create new users.
  • Ensure that you have access to the database instance where you wish to add the user.

Example SQL Script to Create a User

``` -- Connect to your Vector database first -- Then, run the following SQL to create a user

-- Basic form: creating a user CREATE USER example_user IDENTIFIED BY 'StrongPassword1!';

-- Optionally, you may grant specific privileges to the new user GRANT CONNECT TO example_user;

-- To allow the user access to specific tables, you might add GRANT SELECT, INSERT ON example_table TO example_user; ```

Note:
Replace

example_user
,
'StrongPassword1!'
, and
example_table
with your preferred username, a secure password, and actual table names, respectively.

Additional Details

  • Password Policy: Ensure the password meets your site’s security requirements.
  • User Authentication: Vector primarily uses database authentication. If your deployment integrates with OS authentication, consult your site administrator.
  • Privilege Assignment: Users need at least the
    CONNECT
    privilege. For full application functionality, supply further CRUD permissions as needed.

Useful References

Listing Current Users

To list existing users in your Actian Vector database:

SELECT * FROM iipusers;  -- System catalog table of users (may vary according to version/config)

For more details or customized permissions, always refer to the specific version of the Actian Vector documentation deployed in your environment.

Installing the JDBC driver

At the time of writing, the JDBC driver for Actian Vector is not bundled with Matillion Data Productivity Cloud. This is due to licensing and redistribution restrictions placed on the driver. As a result, you will need to manually download and install the appropriate JDBC driver before you can establish a connection to your Vector database.

1. Download the JDBC Driver

Visit the official Actian Vector product page at https://www.actian.com/products/actian-vector/. Locate and download the JDBC driver package, ensuring that you choose the Type 4 JDBC driver, as it operates entirely in Java and provides better compatibility for cloud environments like Matillion.

If a Type 4 driver is available, prioritize it over other driver types (such as Type 2), as Type 4 drivers do not require any native code or database client installation.

2. Review Licensing

Before proceeding, ensure that you review and comply with any license agreements associated with the Actian Vector JDBC driver. You may need to register or accept license terms on the Actian website prior to downloading the driver files.

3. Upload the JDBC Driver to Matillion Data Productivity Cloud

Once you have obtained the JDBC driver (typically as a

.jar
file), follow the official Matillion Data Productivity Cloud instructions for uploading external drivers. Detailed, step-by-step guidance is available at: https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/.

This process typically involves accessing your Matillion agent management console and uploading the JDBC driver

.jar
file, which will then be made available for use in your projects.

4. Configure and Use the Driver

With the driver installed, refer to the Matillion documentation for guidance on using database connectors. Full usage instructions are provided at: https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/.

These instructions will guide you through configuring connections, setting up credentials, and monitoring database queries via the Matillion Data Productivity Cloud interface, now enabled by your manually-installed Actian Vector JDBC driver.

Checking network connectivity

To ensure successful connectivity between Matillion Data Productivity Cloud and your Actian Vector database, you must configure the database to accept incoming connections according to your deployment type:

  • Full SaaS Agent Configuration:
    If you are using the Full SaaS agent configuration, you must allow incoming connections from the IP addresses published by Matillion. You can find the list of required IP addresses here: Matillion IP Address Documentation. Update your Actian Vector database firewall or networking ACLs to permit access from these addresses.

  • Hybrid SaaS Deployment:
    For a Hybrid SaaS deployment, your Actian Vector database must allow incoming connections from the private IP range(s) used by your own Virtual Private Cloud (VPC). Check which IP addresses or ranges are in use in your environment. You can use network checking utilities provided by Matillion to assist with this process: Matillion Network Access Checker.

Additionally, if the Actian Vector database is referenced in Matillion Data Productivity Cloud using a DNS hostname, ensure that the relevant SaaS agent (Full or Hybrid) is able to resolve this address. Proper DNS resolution is required for connectivity to be established.

Querying Data from Actian Vector Database

This guide explains how to issue queries against an Actian Vector database, with examples of SQL SELECT queries and considerations for datatype conversion, particularly when integrating with Snowflake. It also outlines best practices for once-off ("initial") and recurring ("incremental") data loads using the same Database Query component, leveraging filtering for efficient incremental loads.

Example Actian Vector SQL Queries

Here are common examples of queries you might execute against Vector:

``` -- Retrieve all columns and rows from a table SELECT * FROM sales.orders;

-- Select specific columns SELECT order_id, order_date, customer_id FROM sales.orders;

-- Filter results with a WHERE clause SELECT * FROM sales.orders WHERE order_date >= '2024-01-01';

-- Aggregate results SELECT customer_id, COUNT(*) AS order_count FROM sales.orders GROUP BY customer_id;

-- Join multiple tables SELECT o.order_id, o.order_date, c.customer_name FROM sales.orders o JOIN sales.customers c ON o.customer_id = c.customer_id; ```

Datatype Conversion

When integrating Actian Vector with Snowflake, be aware that some datatypes may require conversion or mapping (e.g.,

INTEGER
in Vector may become
NUMBER
in Snowflake, date/timestamp formats may differ). Review both systems’ documentation to manage any precision/format adjustments or unsupported types. For details, refer to: Snowflake datatype mappings.

Recommended Load Pattern: Initial and Incremental Loads

The optimal pattern for extracting data is to perform a once-off initial load, followed by regular incremental loads using the same Database Query component.

Initial Load

For the first run, the query typically selects all available data from a table and includes no filter clause in the Database Query component.

Example:

SELECT * FROM sales.orders;

Incremental Load

For subsequent loads, data is fetched based only on new or updated records since the last load. The Database Query component must include a filter clause (usually on a high-watermark column like

modified_timestamp
or auto-incrementing ID).

Example using timestamp watermark:

SELECT * FROM sales.orders
WHERE modified_timestamp > '{{last_loaded_timestamp}}';

Here,

{{last_loaded_timestamp}}
would be substituted dynamically with the highest timestamp previously loaded.

Read more: Incremental Load / Data Replication Strategy (Matillion Exchange)

Additional Notes

  • The same Database Query component is re-used: for initial load the filter section is empty; for incremental, the filter is populated appropriately.
  • Ensure your source tables have an appropriate column for incremental filtering (e.g., a reliable timestamp or auto-increment ID).
  • Test both modes (initial and incremental) to validate completeness and correctness of loaded data.

By following these instructions, you can confidently query and integrate data from Actian Vector, tailoring your SQL and ETL patterns for reliability and performance.

Data Integration Architecture

One of the significant advantages of the ELT (Extract, Load, Transform) architecture is that it allows you to break down the data integration process into two distinct steps by loading data in advance of integration. This "divide and conquer" strategy streamlines the workflow: raw data is first extracted from source systems and loaded into Snowflake, and then, in a separate step, necessary transformations and integrations are applied. Effective data integration relies heavily on transforming the data to meet the requirements of downstream analytics or business processes, and the most reliable approach to this is through robust data transformation pipelines. With ELT, all transformation and integration tasks are performed directly inside the Snowflake target database, delivering key benefits—pipelines execute at cloud scale and speed, are triggered on demand, and there is no need for maintaining or paying for separate data processing infrastructure outside Snowflake.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.