Integrate data from TigerGraph to Snowflake using Matillion

The TigerGraph to Snowflake connector seamlessly transfers your data to Snowflake within minutes, ensuring it remains current without the need for manual coding or complex ETL scripts.

TigerGraph
Snowflake
TigerGraph to Snowflake banner

Extracting data from TigerGraph to Snowflake

Extracting data from TigerGraph is a vital step in enabling advanced analytics and cross-platform reporting by integrating your graph data with other systems such as Snowflake. To accomplish this efficiently and securely, there are several important configuration and connectivity considerations to address before initiating any data extraction or loading processes. In this article, we will guide you through the key steps involved in moving data from TigerGraph to Snowflake. You will learn how to create an appropriate identity within TigerGraph to ensure secure access to your data. For users employing Matillion, we will discuss how to check for and, if necessary, acquire the required JDBC driver for TigerGraph connectivity. We will also cover best practices for establishing and verifying reliable network connectivity between TigerGraph and Snowflake, ensuring data flows seamlessly from source to target. Finally, we will walk through strategies for both initial and incremental data querying, so that you can efficiently extract and load only the necessary data into Snowflake. Whether you are new to TigerGraph-Snowflake integrations or seeking to optimize your current workflows, this article will provide you with a comprehensive starting point.


What is TigerGraph?

TigerGraph is an enterprise-grade, distributed graph database designed to handle large-scale, complex, and highly-connected data. Optimized for speed and scalability, it employs a native parallel graph processing engine and a high-performance, property graph model that supports ACID transactions. TigerGraph’s GSQL language enables expressive querying and deep analytics across billions of vertices and edges, making it well-suited for use cases such as fraud detection, recommendations, supply chain analysis, and network monitoring. The platform provides rich integration capabilities, including streaming data ingestion and graph algorithms, and features built-in support for horizontal scaling, high availability, and workload management in both on-premises and cloud environments.

matillion logo x TigerGraph

What is Snowflake?

Snowflake is a cloud-native data platform offering scalable data warehousing, analytics, and seamless data sharing. Unlike traditional databases, it uses a multi-cluster, shared-data architecture, letting compute and storage scale independently so multiple workloads can run without resource contention. Snowflake supports semi-structured data formats (JSON, Avro, Parquet), robust SQL analytics, and secure, role-based access controls. It runs on AWS, Azure, and Google Cloud for maximum flexibility and performance. These features make Snowflake well-suited for modern data engineering and analytics pipelines, especially for enterprise use cases requiring efficient, scalable, and secure data management across cloud environments.

Why Move Data from TigerGraph into Snowflake

Unlocking Analytics: The Benefits of Copying Data from TigerGraph to Snowflake

A data engineer or architect might choose to copy data from TigerGraph into Snowflake for several compelling reasons. Firstly, TigerGraph often holds complex, highly-connected data with significant analytical value, especially for use cases such as fraud detection, customer 360, or network analysis. However, the full value of this data is often realized when it is integrated with other organizational data sources—such as transactional databases, data lakes, or third-party datasets—to enable richer analytics and deeper business insights. By transferring TigerGraph data into Snowflake, organizations can leverage Snowflake’s powerful data integration and analytical capabilities to perform these comprehensive analyses without burdening the operational workload of the TigerGraph system. This not only helps maintain TigerGraph’s performance for graph queries but also streamlines large-scale data processing and analysis within an environment optimized for such operations.

Creating an Identity in TigerGraph

TigerGraph users (identities) are managed at the database level. System and database administrators can grant and control access using the built-in privilege and role-based access control model. The following instructions describe how to create a new user in TigerGraph using the

CREATE USER
statement in GSQL and configure their privileges.

Prerequisites

  • Ensure you have administrative access to your TigerGraph system, typically as a user with the
    superuser
    or
    admin
    role.
  • Open a GSQL shell session (
    gsql
    ) as an authorized administrator.

1. Connect to GSQL

bash
gsql

2. Create a New User

CREATE USER new_username WITH PASSWORD "strongpassword";
- Replace
new_username
with the desired username. - Replace
"strongpassword"
with a secure password.

Example:

CREATE USER analyst1 WITH PASSWORD "AnalystP@ss2024!";

3. (Optional) Assign the User to a Role

Assign a role to specify the privileges for the new user. For example, to grant the built-in read-only role:

GRANT ROLE globalreader TO analyst1;

Other built-in roles include:

superuser
,
admin
,
designer
,
querywriter
,
queryreader
, etc.

4. (Optional) Grant Database-Specific Privileges

You may also grant user privileges on a specific graph. For example, to assign the

designer
role on a graph named
MyGraph
:

GRANT ROLE designer ON GRAPH MyGraph TO analyst1;

5. Verify User Creation

List all users to verify:

SHOW USERS;

Notes

  • Passwords must follow TigerGraph password policies (minimum length, complexity, etc.).
  • Administrative users can revoke roles using the
    REVOKE ROLE ... FROM ...
    syntax.
  • For full role and privilege details, consult the TigerGraph Documentation > User Management

Reference: TigerGraph GSQL - CREATE USER Statement

Installing the JDBC Driver

At the time of writing, the TigerGraph JDBC driver is not bundled with Matillion Data Productivity Cloud by default. This is due to licensing or redistribution restrictions associated with the TigerGraph driver. To enable connectivity between Matillion’s Data Productivity Cloud and your TigerGraph database, you'll need to manually obtain and install the driver.

1. Download the TigerGraph JDBC Driver

  1. Visit the TigerGraph official documentation link for JDBC drivers: TigerGraph JDBC Driver Download.
  2. Look for a Type 4 JDBC driver. Type 4 drivers are pure Java drivers, which do not require native libraries and are generally the preferred choice for most JDBC integrations.

2. Upload the Driver to Matillion Data Productivity Cloud

After downloading the appropriate driver .jar file, you'll need to add it as an external driver in Matillion Data Productivity Cloud.

3. Configuring and Using the TigerGraph Connection

Once the driver is installed and available, you can proceed to set up your TigerGraph connection within Matillion's Data Productivity Cloud Designer.

Be sure to consult both TigerGraph and Matillion documentation periodically for any updates related to driver compatibility, installation processes, or best practices.

Checking network connectivity

To ensure seamless connectivity between Matillion Data Productivity Cloud and your TigerGraph database, you must configure your TigerGraph instance to allow incoming network connections. The required configuration depends on your deployment type:

Full SaaS Agent Configuration: If you are deploying in a Full SaaS agent model, you must permit incoming connections from the IP addresses used by Matillion SaaS agents. Refer to the official list of allowed IP addresses here: Matillion Allowed IP Addresses. Ensure these IP addresses are added to your TigerGraph database's access controls, firewall rules, or network security groups as appropriate.

Hybrid SaaS Deployment: For Hybrid SaaS deployments, the network traffic to TigerGraph originates from your own cloud environment, specifically your virtual private cloud (VPC). In this case, configure TigerGraph to allow incoming connections from your VPC's IP range(s). You can verify connectivity and obtain helpful utilities using the tools provided here: Matillion Exchange: Pipeline Network Check.

DNS Resolution: If your TigerGraph database is referenced via a DNS hostname (rather than a direct IP address), ensure that the Full SaaS or Hybrid SaaS agent can resolve the DNS name to the correct IP address. The agent must have network access to a DNS server that can resolve the hostname for successful connections.

Querying Data from TigerGraph Database

This guide explains to technical users how to query data from a TigerGraph database, especially in the context of data integration and ETL processes using familiar SQL

SELECT
-style statements.

SQL-Style Query Examples

TigerGraph uses its proprietary graph query language called GSQL, but data integrations often require SQL-like statements to interface with ETL tools such as those from Matillion. When querying TigerGraph using these tools, queries are usually expressed in a form similar to SQL

SELECT
. Here are a few examples:

``` -- Example: Selecting all Person vertices SELECT * FROM Person;

-- Example: Selecting edges of type 'Friend' with properties SELECT * FROM Friend;

-- Example: Selecting properties from Movie vertices SELECT title, release_date, genre FROM Movie;

-- Example: Filtering Actors by age SELECT name, age FROM Actor WHERE age > 40; ```

Note: Under the hood, TigerGraph does not support SQL, but provides JDBC/ODBC connectors or REST endpoints that enable SQL-like querying for external tools.

Datatype Conversion between TigerGraph and Snowflake

When loading data from TigerGraph into environments such as Snowflake, expect datatype conversion. Some property types in TigerGraph may not have exact equivalents in Snowflake. For example:

TigerGraph Type Closest Snowflake Type
INT NUMBER
DOUBLE FLOAT
STRING VARCHAR
DATETIME TIMESTAMP_NTZ/TZ/LTZ
BOOL BOOLEAN

Best practice: Review column mapping and datatype conversions to ensure integrity during ETL jobs.

Initial Load vs. Incremental Load Patterns

A robust pattern to import data efficiently is to combine an initial "full" load with ongoing incremental loads. Use the same Database Query component (e.g., in Matillion) for both, but vary your query's filter clause.

1. Initial (Full) Load

On the first load, the Database Query component runs without a filter clause, extracting all relevant data:

-- Initial load: Select ALL records
SELECT * FROM Person;

2. Incremental Loads

For subsequent imports, apply a filter to select only new or updated records, often based on a timestamp or version field:

-- Incremental load: Select only updated or new records since last load
SELECT * FROM Person
WHERE last_modified > '2024-05-01 00:00:00';

Adjust the timestamp based on your last successful incremental sync.

For detailed strategies on incremental load implementation patterns, see Matillion Exchange: Incremental Load Data Replication Strategy.


By following this approach, you efficiently manage data extraction from TigerGraph to downstream systems (e.g., Snowflake), supporting both bulk onboarding and ongoing synchronization while maintaining query performance and data consistency.

Data Integration Architecture

A key advantage of the ELT (Extract, Load, Transform) architecture is its ability to simplify the data integration process by separating it into distinct steps: loading and then transforming data. By loading raw data into the Snowflake database before integration, organizations can “divide and conquer” the overall workflow, making it easier to manage and troubleshoot each stage independently. For effective data integration, it is essential to transform the loaded data, and this is best accomplished using data transformation pipelines, which automate and orchestrate the necessary transformations reliably and repeatedly. Another important benefit of the ELT approach is that all data transformation and integration processes take place directly within the target Snowflake environment. This enables fast, on-demand, and highly scalable operations by leveraging Snowflake’s compute capabilities, and it eliminates the need for—and associated costs of—external data processing infrastructure.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.