Integrate data from AllegroGraph to Snowflake using Matillion

The AllegroGraph to Snowflake connector seamlessly transfers your data to Snowflake within minutes, ensuring it remains current without requiring manual coding or intricate ETL scripting.

AllegroGraph
Snowflake
AllegroGraph to Snowflake banner

Extracting data from AllegroGraph to Snowflake

Extracting data from AllegroGraph is a common requirement for organizations seeking to leverage the scalability and analytical power of modern data warehousing solutions like Snowflake. This article provides step-by-step guidance on moving data from AllegroGraph, a leading RDF graph database, into Snowflake for advanced processing and analysis. We will begin by demonstrating how to create an identity within AllegroGraph to facilitate secure access and permission management. For those utilizing Matillion as their ETL tool, we will outline how to check for, or obtain, the required JDBC driver to connect to AllegroGraph. Ensuring robust network connectivity between the source and target platforms is also essential for a successful extraction and load process, and this will be covered in detail. Finally, the article will address two core data acquisition scenarios: performing an initial, comprehensive data extraction, and configuring incremental queries to load only new or updated data going forward. By following these steps, you will be well-equipped to integrate AllegroGraph data into Snowflake and maximize its analytical value.


What is AllegroGraph?

AllegroGraph is a high-performance, enterprise-grade graph database developed by Franz Inc., optimized for storing, querying, and managing highly interconnected datasets structured as RDF (Resource Description Framework) triples. Supporting both SPARQL and Prolog-based querying, AllegroGraph is designed to handle complex knowledge graphs and semantic web applications, offering robust reasoning capabilities, ontology management, and support for advanced features such as geospatial and temporal data. The system provides ACID-compliant transactions, horizontal scaling via Multi-Master Replication, and integration with various AI and data analytics tools. Its mature ecosystem and comprehensive APIs make AllegroGraph a robust choice for organizations requiring scalable, standards-compliant solutions for semantic data integration, entity resolution, and knowledge discovery.

matillion logo x AllegroGraph

What is Snowflake?

Snowflake is a cloud-native data platform designed for high scalability, flexibility, and ease of use, targeting data warehousing and analytics. Its unique architecture separates storage and compute, enabling elastic scaling and performance for many users. Snowflake supports structured and semi-structured data (e.g., JSON, Avro, Parquet), offers managed maintenance, strong security (granular access, encryption), and integrates with popular BI and ETL tools. With multi-cloud support and abstraction of infrastructure, Snowflake lets teams focus on data engineering, analytics, and machine learning, rather than hardware or complex software management.

Why Move Data from AllegroGraph into Snowflake

Benefits of Transferring Data from AllegroGraph to Snowflake for Advanced Analytics

A data engineer or architect may choose to copy data from AllegroGraph into Snowflake for several compelling reasons. AllegroGraph often contains highly valuable data, particularly in the form of semantic or graph-structured information, which can be instrumental when combined with data from other sources. By integrating this data within Snowflake, organizations can leverage advanced analytical capabilities and perform comprehensive data transformations alongside other enterprise datasets. Furthermore, executing these data workflows in Snowflake, rather than directly on AllegroGraph, ensures that computational workloads associated with data integration and analysis do not impact the performance or stability of the AllegroGraph database itself. This approach not only preserves the responsiveness of AllegroGraph but also centralizes analytics in a scalable and optimized environment.

Creating a User in AllegroGraph

This guide provides step-by-step instructions for creating a user in an AllegroGraph database using the AGWebView browser interface and through RESTful API calls.


Prerequisites

  • Administrative access to an AllegroGraph repository.
  • AllegroGraph server running and accessible.
  • (Optional) curl or another tool to make HTTP REST API requests.

Creating a User Using the AGWebView

  1. Log In:
    Open your AllegroGraph WebView interface (typically at http://your-server:10035) in a web browser and log in with administrative credentials.

  2. Open Manage Users:
    In the top menu, navigate to Admin > Manage Users.

  3. Add New User:
    Click on the Add User button.

  4. Fill in User Details:

  5. User Name: Enter the desired username.
  6. Password: Set a strong password.
  7. Confirm Password: Re-type the password.
  8. Roles: Assign roles as needed (for example, select 'user', 'admin', etc.).
  9. Enabled: Ensure the user is enabled.

  10. Save:
    Click Save to create the user.


Creating a User with the REST API

To create a user programmatically, use the AllegroGraph REST API User Management.

Example using

curl
:

bash
curl -u <admin_user>:<admin_password> \
     -X POST \
     -H "Content-Type: application/json" \
     -d '{
            "username": "newuser",
            "password": "StrongPa$$word",
            "enabled": true,
            "roles": ["user"]
         }' \
     http://your-server:10035/rest/users

Replace:

  • <admin_user>
    and
    <admin_password>
    with your admin credentials.
  • your-server:10035
    with your server and port.
  • "username"
    ,
    "password"
    , and
    "roles"
    as appropriate.

Notes

  • AllegroGraph does not use SQL for security/user management. All such operations are handled through its REST API or the AGWebView interface.
  • Roles such as
    "user"
    or
    "admin"
    grant specific permissions; consult the official documentation for role details.
  • Additional settings like repository-level permissions can be configured as needed after user creation.

Installing the JDBC driver

At the time of writing, the AllegroGraph JDBC driver is not bundled with Matillion Data Productivity Cloud by default. This omission is due to licensing or redistribution restrictions imposed by the vendor. You will need to manually download and install the JDBC driver before establishing connectivity between Matillion and an AllegroGraph database.

To complete this process, follow the instructions below:

  1. Download the AllegroGraph JDBC Driver
    Visit the official AllegroGraph documentation page to access the JDBC driver download:
    https://franz.com/agraph/support/documentation/current/jdbc/
    When downloading the driver, look for a Type 4 JDBC driver—as this type is preferred for its platform independence and compatibility with most Java-based applications, including Matillion.

  2. Prepare the JDBC JAR File
    Once downloaded, locate the JAR file for the AllegroGraph JDBC driver on your system. This is the file you will upload into the Matillion environment.

  3. Upload the JDBC Driver to Matillion
    In order to use the driver within Matillion Data Productivity Cloud, you must upload it as an external driver. Detailed instructions on how to perform this upload can be found here:
    Uploading External Drivers into Matillion Data Productivity Cloud

  4. Configure and Use the Driver in Matillion
    After successfully uploading the driver, follow the Matillion database connection setup process to establish connectivity to your AllegroGraph instance. Specific usage instructions, including connection configuration and query execution, are available at:
    Database Query Usage with External JDBC Drivers

Please ensure your use of the AllegroGraph JDBC driver complies with relevant licensing terms specified on the Franz Inc. site.

Checking network connectivity

To enable connectivity between Matillion Data Productivity Cloud and your AllegroGraph database, you must ensure that the AllegroGraph server allows incoming network connections according to your deployment configuration:

DNS Resolution:
If the AllegroGraph database is referenced by Domain Name System (DNS), ensure that the Matillion Full SaaS or Hybrid SaaS agent is able to resolve the AllegroGraph hostname to the correct IP address from their respective networks. This may require configuring appropriate DNS settings or records to ensure connectivity.

Querying Data from an AllegroGraph Database

This guide demonstrates how to query data from an AllegroGraph database, focusing on extracting data in a way compatible with ETL/ELT processes, such as with Snowflake, and using patterns for initial and incremental loads.

1. AllegroGraph Query Examples Using SQL SELECT Statements

Although AllegroGraph is primarily a RDF (Resource Description Framework) graph database, it provides an SQL-like interface for users more comfortable with relational queries. Below are examples of how such queries may look:

Example: Basic SELECT Statement

SELECT subject, predicate, object
FROM triple
WHERE predicate = '<http://example.org/vocab#status>'

Example: Joining Triple Data

SELECT t1.subject, t1.object AS name, t2.object AS email
FROM triple t1
JOIN triple t2
  ON t1.subject = t2.subject
WHERE t1.predicate = '<http://xmlns.com/foaf/0.1/name>'
  AND t2.predicate = '<http://xmlns.com/foaf/0.1/mbox>'

Example: Filtering with WHERE

SELECT subject, object
FROM triple
WHERE predicate = '<http://example.org/vocab#created>'
  AND object > "2023-01-01T00:00:00Z"^^xsd:dateTime

2. Datatype Conversion Between AllegroGraph and Snowflake

Because AllegroGraph stores RDF data (often as strings, URIs, literals with types), datatype conversion is sometimes required when ingesting into Snowflake:

  • RDF Literal to VARCHAR / STRING: Most literals can be mapped directly.
  • xsd:dateTime or xsd:date to DATE/TIMESTAMP: Convert RDF
    xsd:dateTime
    literals to Snowflake DATE/TIMESTAMP types.
  • Boolean: RDF boolean values (
    "true"^^xsd:boolean
    ) mapped to primary Snowflake BOOLEAN.
  • Numeric types: Convert RDF encoded numerics to INTEGER, FLOAT, or NUMBER as supported.

Proper casting or transformations may be necessary during the ETL load process (for more details, refer to the Snowflake and AllegroGraph Data Types documentation, and ensure your ETL tool handles these conversions.)

3. Recommended Data Load Pattern

The best practice for querying and loading data from AllegroGraph is to perform:

  1. A Once-off Initial Load
  2. Subsequent Incremental Loads

Both can use the same Database Query component in your ETL tool (e.g., Matillion).

Initial Load Pattern

  • Query Pattern: The SELECT statement reads the entire target dataset.
  • No filter clause is present.

Example:

SELECT subject, predicate, object
FROM triple

Incremental Load Pattern

  • Query Pattern: The SELECT statement includes a filter clause to select only changed/new records based on a suitable field (e.g.,
    created
    or
    modified
    timestamp).
  • Filter expressed in the WHERE clause.

Example:

SELECT subject, predicate, object
FROM triple
WHERE predicate = '<http://example.org/vocab#modified>'
  AND object > '${LAST_UPDATED_AT}'^^xsd:dateTime

For more details about incremental vs. initial loading strategies, see: Incremental Load Data Replication Strategy (Matillion Exchange)


Tip: Use the Database Query component in both cases—just adjust the filter as necessary for incremental loading, ensuring more efficient ETL operations.

Note: When using parameterized queries (e.g.,

${LAST_UPDATED_AT}
or similar), ensure parameters are correctly defined in your workflow.

Data Integration Architecture

Loading data in advance of integration exemplifies the "divide and conquer" strategy, as it separates the process into two distinct steps: extraction and loading, followed by transformation and integration. This approach is a hallmark of the ELT (Extract, Load, Transform) architecture, where raw data is first loaded into a centralized repository like Snowflake before any transformation or integration occurs. Effective data integration necessitates data transformation, and this is most efficiently achieved using transformation pipelines that can automate and orchestrate complex data workflows. A key advantage of the ELT approach is that both data transformation and integration activities take place inside the target Snowflake database. This design leverages Snowflake’s high-performance, scalable computing resources, enabling on-demand processing without requiring investment in separate data processing infrastructure. As a result, organizations can realize faster integration, more responsive analytics, and cost savings, as all transformations rely on Snowflake’s elastic architecture rather than external tools.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.