Integrate data from AllegroGraph to Amazon Redshift using Matillion

Our AllegroGraph to Redshift connector enables seamless, code-free data transfers to Redshift in minutes, ensuring your data remains current without the complexity of manual scripting or ETL processes.

AllegroGraph
Amazon Redshift
AllegroGraph to Amazon Redshift banner

Extracting data from AllegroGraph to Amazon Redshift

Extracting data from AllegroGraph is an essential task for organizations that rely on robust graph databases but require advanced analytics and reporting capabilities offered by platforms like Amazon Redshift. Migrating and synchronizing data between these systems can unlock valuable insights by combining graph-based relationships with high-performance data warehousing. In this article, we will guide you through the key steps required to extract data from AllegroGraph and load it into Redshift. The process will begin with creating the necessary identity within AllegroGraph to enable secure integrations. For Matillion ETL users, we will address both verifying the presence of, and acquiring if needed, the appropriate JDBC driver for seamless connectivity. Next, we will discuss network connectivity considerations to ensure smooth data transfer from your source (AllegroGraph) to the target (Redshift). Finally, we will provide methods for querying and extracting data, covering both initial bulk loads and ongoing, incremental updates. By following these steps, you will establish a streamlined pipeline for transferring data from AllegroGraph into Redshift, enabling a powerful blend of graph-based and relational analytics in your organization.


What is AllegroGraph?

AllegroGraph, developed by Franz Inc., is an enterprise-grade graph database designed for large-scale, highly interconnected data. Supporting RDF standards, it offers advanced SPARQL queries, semantic inferencing, and reasoning, making it ideal for knowledge graphs, linked data, and AI. AllegroGraph efficiently manages billions of triples using indexing, distributed computing, and horizontal scalability. It supports multiple data types, including JSON-LD, text, and geospatial data. Enterprise features include ACID compliance, security authentication, high availability via clustering, and seamless integration with Python, Java, and other programming environments.

matillion logo x AllegroGraph

What is Amazon Redshift?

Amazon Redshift is a fully managed, cloud-based, petabyte-scale data warehouse, optimized for efficient querying and analysis of large datasets. Built on PostgreSQL, it uses columnar storage, massively parallel processing, and advanced query optimization for high performance. Its architecture separates compute and storage for dynamic scaling, with encryption for data in transit and at rest. Redshift integrates with AWS services like S3, Glue, and SageMaker, supporting seamless data lakes, ETL pipelines, and machine learning workflows. It offers standard SQL support and JDBC/ODBC connectivity, making it accessible to various analytics tools and programming languages.

Why Move Data from AllegroGraph into Amazon Redshift

Leveraging Data Copy from AllegroGraph to Redshift for Enhanced Analytics

A data engineer or architect may wish to copy data from AllegroGraph into Redshift for several key reasons. AllegroGraph often contains data that is rich and valuable, typically organized as graph or RDF data, which can offer unique insights. However, the true value of this data is realized when it is integrated with information from other sources, such as transactional systems or analytical databases. By copying the data into Redshift, a scalable analytical data warehouse, organizations are able to join AllegroGraph data with other datasets efficiently, enabling advanced analytics and reporting. Furthermore, using Redshift for data integration and heavy analytical querying avoids imposing additional workload and performance constraints on the AllegroGraph server itself, thereby ensuring that both systems continue to operate optimally within their intended roles.

Creating a User in AllegroGraph

This guide describes how to create a user in an AllegroGraph database repository. AllegroGraph manages users and roles to control access. User and role management is typically performed via the REST API, AGWebView UI, or through Lisp/Python/Javascript client libraries. While AllegroGraph does not directly use SQL, you can perform user-management actions via HTTP requests.

Below, we provide instructions using the cURL command-line tool with the REST API, as this is platform-agnostic. Substitute values as appropriate for your installation.


Prerequisites

  • You must have administrator privileges.
  • Ensure your server URL, superuser credentials, and repository are at hand.

Instructions

1. Define the New User

Decide on the username and password for the new user. You may optionally assign the user to roles.

2. Issue the Create User Request

Use the following

curl
command to create a new user:

bash
curl -u <admin-user>:<admin-password> \
     -X POST \
     -H "Content-Type: application/json" \
     -d '{
           "username": "<new-username>",
           "password": "<new-password>",
           "roles": ["<role1>", "<role2>"]
         }' \
     http://<host>:<port>/users

  • Replace
    <admin-user>
    and
    <admin-password>
    with your administrator credentials.
  • Choose your
    <new-username>
    and
    <new-password>
    .
  • Replace
    <role1>
    ,
    <role2>
    , etc., with roles appropriate for the user, or leave the
    "roles"
    array empty for no specific role assignment.
  • Use the correct
    <host>
    and
    <port>
    for your AllegroGraph instance (commonly port 10035).
Example

To create user

alice
with password
Wonder1and!
, assigning them to the
user
role:

bash
curl -u test:test \
     -X POST \
     -H "Content-Type: application/json" \
     -d '{
           "username": "alice",
           "password": "Wonder1and!",
           "roles": ["user"]
         }' \
     http://localhost:10035/users

3. Verifying User Creation

You can list all users to verify:

bash
curl -u <admin-user>:<admin-password> \
     -X GET \
     http://<host>:<port>/users


For more information, see AllegroGraph REST API documentation.

Installing the JDBC Driver

When integrating AllegroGraph with Matillion Data Productivity Cloud, you should be aware that the AllegroGraph JDBC driver is not included with Matillion by default. This exclusion is due to licensing or redistribution restrictions. As a result, you must manually download and install the driver, following the requirements outlined by both AllegroGraph and Matillion.

Step 1: Downloading the JDBC Driver

  1. Navigate to the official Franz Inc. AllegroGraph JDBC driver download page:
    https://franz.com/agraph/support/documentation/current/jdbc/

  2. On the download page, locate and select the Type 4 JDBC driver (sometimes called a “thin” driver), as this is generally preferred for most integration scenarios. The Type 4 driver does not require native libraries and is suitable for diverse deployment environments.

  3. Download the appropriate

    jar
    file for your environment and note its location on your local filesystem.

Step 2: Uploading the Driver into Matillion

  1. Review Matillion's official documentation on uploading external JDBC drivers here:
    https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/

  2. Follow the instructions to:

  3. Navigate to the Agents section within the Matillion Data Productivity Cloud interface.
  4. Upload the downloaded AllegroGraph JDBC driver
    jar
    file as an external driver.
  5. Associate the uploaded driver with your relevant agent or environment as required.

  6. After the upload, verify the successful inclusion of the driver in the list of available external JDBC drivers for your agent.

Step 3: Configure and Use the JDBC Connection

  1. Consult Matillion’s guide to using external database drivers:
    https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/

  2. Use the Matillion Designer or Database Query components to set up and configure your AllegroGraph connection, specifying necessary JDBC URLs and credentials as required by your specific Databricks or workflow design.

By following the steps above, you are able to set up the required JDBC connectivity with AllegroGraph in Matillion Data Productivity Cloud.

Checking network connectivity

To enable successful communication between the Matillion Data Productivity Cloud and your AllegroGraph database, you must ensure that the AllegroGraph service is configured to permit incoming connections from the appropriate sources, depending on your deployment configuration:

  • Full SaaS agent configuration:
    Ensure that the AllegroGraph database allows inbound connections from the Matillion Data Productivity Cloud SaaS agent IP addresses. The full list of these approved IP addresses is maintained at the following URL: https://docs.matillion.com/data-productivity-cloud/security/docs/allowing-ip-addresses/. Update your firewall or access control lists accordingly to accept connections from these sources.

  • Hybrid SaaS deployment:
    When using a Hybrid SaaS setup, the connection from Matillion will originate from your own virtual private cloud (VPC) environment. Therefore, you must configure your AllegroGraph server to allow inbound connections from your VPC’s IP address range. If you need to check connectivity or troubleshoot your network access, utilize the tools and resources available at: https://exchange.matillion.com/data-productivity-cloud/pipeline/check-network-access/.

DNS considerations:
If you reference your AllegroGraph database using a DNS hostname rather than a direct IP address, make sure that your chosen deployment option (either the Full SaaS agent or the Hybrid SaaS agent) has appropriate network access configured to resolve the DNS name. The agent must be able to look up and resolve this hostname to successfully connect to the AllegroGraph instance.

Querying Data from AllegroGraph Database

This guide explains how to query data from an AllegroGraph database for ETL (Extract, Transform, Load) scenarios, with example

SELECT
statements, handling of datatype differences, and recommended patterns for initial and incremental data loads. The Database Query component is referenced as part of tools such as Matillion ETL; more info: Matillion Incremental Loads.


Example: SQL SELECT Statements for AllegroGraph

AllegroGraph can be accessed via its SQL-like interface to perform data extractions, usually through tools that support JDBC or ODBC connections.

  • Standard Query Example:

    SELECT subject, predicate, object
      FROM allegrograph.triples

  • Projection with Filtering:
    (Retrieve all people named "Alice")

    SELECT subject, predicate, object
      FROM allegrograph.triples
      WHERE object = 'Alice'

  • Using Functions & Casting:
    (Convert a datatype if needed, e.g., date-time string to timestamp)

    SELECT CAST(object AS TIMESTAMP) as person_birthdate
      FROM allegrograph.triples
      WHERE predicate = '<http://example.org/birthdate>'


Datatype Conversion between AllegroGraph and Redshift

When extracting from AllegroGraph for Redshift, be aware of datatype differences:

AllegroGraph Datatype Corresponding Redshift Datatype Notes
STRING VARCHAR Straightforward mapping
INTEGER INTEGER No conversion needed
FLOAT DOUBLE PRECISION No conversion needed
DATE/TIME (xsd:date) TIMESTAMP (Redshift) Use
CAST
or ETL datetime conversion
URI/Resource VARCHAR Treat as VARCHAR in Redshift
BOOLEAN BOOLEAN No conversion needed

Be sure to handle explicit datatype casts either in your SQL (

CAST()
), your ETL tool, or in your Redshift table design.


Load Strategies: Initial and Incremental

The best pattern for ETL from AllegroGraph is to perform a once-off initial load followed by periodic incremental loads, as described in detail here: Matillion Incremental Load Strategy.

Initial Load Example

For the initial load (complete dataset), the

Database Query
component runs a simple SELECT statement with no filter, loading all relevant AllegroGraph rows:

-- Initial, all-dataset extract
SELECT subject, predicate, object
FROM allegrograph.triples

Incremental Load Example

For incremental loads (since last ETL), add a filter based on a change-tracking field, such as a modification timestamp or sequence number. For example, if your triples have a

modified_at
or an event log table:

-- Incremental extract: load only new or updated records
SELECT subject, predicate, object
FROM allegrograph.triples
WHERE modified_at > '${last_successful_load_timestamp}'

Replace

modified_at
and the placeholder with your actual fields or high water mark strategy.

Notes

  • Use the same Database Query Component for both initial and incremental loads; only the SQL changes (filters).
  • Store and update the high water mark (e.g., max timestamp or ID processed) in your orchestration logic.
  • Read more: Matillion Data Replication Strategy

By following these guidelines, you can efficiently query and migrate data from AllegroGraph to downstream systems such as Amazon Redshift while maintaining data consistency and optimizing ETL performance.

Data Integration Architecture

Loading data in advance of integration is a fundamental advantage of the Extract, Load, Transform (ELT) architecture, as it allows the overall data integration process to be divided into two manageable steps. This divide-and-conquer approach enables organizations to first focus on reliably moving raw data into the Redshift environment, before tackling the often more complex challenge of integrating and transforming that data. Successful data integration requires meaningful transformation of diverse data sources, and this is best achieved through robust data transformation pipelines that execute complex business logic and standardization routines. A further benefit of the ELT approach is that both data transformation and integration are performed directly inside the target Redshift database. This centralized processing is not only fast and inherently scalable—leveraging Redshift's parallelism and native compute resources—but also cost-efficient, since it eliminates the need for separate processing infrastructure and allows transformations to run on-demand, exactly where your data resides.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.