Extracting data from TigerGraph to Snowflake
Extracting data from TigerGraph is a vital step in enabling advanced analytics and cross-platform reporting by integrating your graph data with other systems such as Snowflake. To accomplish this efficiently and securely, there are several important configuration and connectivity considerations to address before initiating any data extraction or loading processes. In this article, we will guide you through the key steps involved in moving data from TigerGraph to Snowflake. You will learn how to create an appropriate identity within TigerGraph to ensure secure access to your data. For users employing Matillion, we will discuss how to check for and, if necessary, acquire the required JDBC driver for TigerGraph connectivity. We will also cover best practices for establishing and verifying reliable network connectivity between TigerGraph and Snowflake, ensuring data flows seamlessly from source to target. Finally, we will walk through strategies for both initial and incremental data querying, so that you can efficiently extract and load only the necessary data into Snowflake. Whether you are new to TigerGraph-Snowflake integrations or seeking to optimize your current workflows, this article will provide you with a comprehensive starting point.
What is TigerGraph?
TigerGraph is an enterprise-grade, distributed graph database designed to handle large-scale, complex, and highly-connected data. Optimized for speed and scalability, it employs a native parallel graph processing engine and a high-performance, property graph model that supports ACID transactions. TigerGraph’s GSQL language enables expressive querying and deep analytics across billions of vertices and edges, making it well-suited for use cases such as fraud detection, recommendations, supply chain analysis, and network monitoring. The platform provides rich integration capabilities, including streaming data ingestion and graph algorithms, and features built-in support for horizontal scaling, high availability, and workload management in both on-premises and cloud environments.
What is Snowflake?
Snowflake is a cloud-native data platform offering scalable data warehousing, analytics, and seamless data sharing. Unlike traditional databases, it uses a multi-cluster, shared-data architecture, letting compute and storage scale independently so multiple workloads can run without resource contention. Snowflake supports semi-structured data formats (JSON, Avro, Parquet), robust SQL analytics, and secure, role-based access controls. It runs on AWS, Azure, and Google Cloud for maximum flexibility and performance. These features make Snowflake well-suited for modern data engineering and analytics pipelines, especially for enterprise use cases requiring efficient, scalable, and secure data management across cloud environments.
Why Move Data from TigerGraph into Snowflake
Unlocking Analytics: The Benefits of Copying Data from TigerGraph to Snowflake
A data engineer or architect might choose to copy data from TigerGraph into Snowflake for several compelling reasons. Firstly, TigerGraph often holds complex, highly-connected data with significant analytical value, especially for use cases such as fraud detection, customer 360, or network analysis. However, the full value of this data is often realized when it is integrated with other organizational data sources—such as transactional databases, data lakes, or third-party datasets—to enable richer analytics and deeper business insights. By transferring TigerGraph data into Snowflake, organizations can leverage Snowflake’s powerful data integration and analytical capabilities to perform these comprehensive analyses without burdening the operational workload of the TigerGraph system. This not only helps maintain TigerGraph’s performance for graph queries but also streamlines large-scale data processing and analysis within an environment optimized for such operations.
Similar connectors
Creating an Identity in TigerGraph
TigerGraph users (identities) are managed at the database level. System and database administrators can grant and control access using the built-in privilege and role-based access control model. The following instructions describe how to create a new user in TigerGraph using the
CREATE USERstatement in GSQL and configure their privileges.
Prerequisites
- Ensure you have administrative access to your TigerGraph system, typically as a user with the
superuser
oradmin
role. - Open a GSQL shell session (
gsql
) as an authorized administrator.
1. Connect to GSQL
bash gsql
2. Create a New User
CREATE USER new_username WITH PASSWORD "strongpassword";- Replace
new_usernamewith the desired username. - Replace
"strongpassword"with a secure password.
Example:
CREATE USER analyst1 WITH PASSWORD "AnalystP@ss2024!";
3. (Optional) Assign the User to a Role
Assign a role to specify the privileges for the new user. For example, to grant the built-in read-only role:
GRANT ROLE globalreader TO analyst1;
Other built-in roles include:
superuser,
admin,
designer,
querywriter,
queryreader, etc.
4. (Optional) Grant Database-Specific Privileges
You may also grant user privileges on a specific graph. For example, to assign the
designerrole on a graph named
MyGraph:
GRANT ROLE designer ON GRAPH MyGraph TO analyst1;
5. Verify User Creation
List all users to verify:
SHOW USERS;
Notes
- Passwords must follow TigerGraph password policies (minimum length, complexity, etc.).
- Administrative users can revoke roles using the
REVOKE ROLE ... FROM ...
syntax. - For full role and privilege details, consult the TigerGraph Documentation > User Management
Reference: TigerGraph GSQL - CREATE USER Statement
Installing the JDBC Driver
At the time of writing, the TigerGraph JDBC driver is not bundled with Matillion Data Productivity Cloud by default. This is due to licensing or redistribution restrictions associated with the TigerGraph driver. To enable connectivity between Matillion’s Data Productivity Cloud and your TigerGraph database, you'll need to manually obtain and install the driver.
1. Download the TigerGraph JDBC Driver
- Visit the TigerGraph official documentation link for JDBC drivers: TigerGraph JDBC Driver Download.
- Look for a Type 4 JDBC driver. Type 4 drivers are pure Java drivers, which do not require native libraries and are generally the preferred choice for most JDBC integrations.
2. Upload the Driver to Matillion Data Productivity Cloud
After downloading the appropriate driver .jar file, you'll need to add it as an external driver in Matillion Data Productivity Cloud.
- Follow the step-by-step instructions available here: Uploading External JDBC Drivers to Matillion Data Productivity Cloud.
- Ensure that you add the .jar file for the TigerGraph Type 4 JDBC driver.
- Once uploaded, configure the agent to make sure the driver is available for use with database connections within the platform.
3. Configuring and Using the TigerGraph Connection
Once the driver is installed and available, you can proceed to set up your TigerGraph connection within Matillion's Data Productivity Cloud Designer.
- Refer to these usage instructions: Working with Database Query components in Matillion Designer.
- Create a new database connection using the newly uploaded TigerGraph JDBC driver, then use this connection in database query components as required by your workflow.
Be sure to consult both TigerGraph and Matillion documentation periodically for any updates related to driver compatibility, installation processes, or best practices.
Checking network connectivity
To ensure seamless connectivity between Matillion Data Productivity Cloud and your TigerGraph database, you must configure your TigerGraph instance to allow incoming network connections. The required configuration depends on your deployment type:
Full SaaS Agent Configuration: If you are deploying in a Full SaaS agent model, you must permit incoming connections from the IP addresses used by Matillion SaaS agents. Refer to the official list of allowed IP addresses here: Matillion Allowed IP Addresses. Ensure these IP addresses are added to your TigerGraph database's access controls, firewall rules, or network security groups as appropriate.
Hybrid SaaS Deployment: For Hybrid SaaS deployments, the network traffic to TigerGraph originates from your own cloud environment, specifically your virtual private cloud (VPC). In this case, configure TigerGraph to allow incoming connections from your VPC's IP range(s). You can verify connectivity and obtain helpful utilities using the tools provided here: Matillion Exchange: Pipeline Network Check.
DNS Resolution: If your TigerGraph database is referenced via a DNS hostname (rather than a direct IP address), ensure that the Full SaaS or Hybrid SaaS agent can resolve the DNS name to the correct IP address. The agent must have network access to a DNS server that can resolve the hostname for successful connections.
Querying Data from TigerGraph Database
This guide explains to technical users how to query data from a TigerGraph database, especially in the context of data integration and ETL processes using familiar SQL
SELECT-style statements.
SQL-Style Query Examples
TigerGraph uses its proprietary graph query language called GSQL, but data integrations often require SQL-like statements to interface with ETL tools such as those from Matillion. When querying TigerGraph using these tools, queries are usually expressed in a form similar to SQL
SELECT. Here are a few examples:
``` -- Example: Selecting all Person vertices SELECT * FROM Person;
-- Example: Selecting edges of type 'Friend' with properties SELECT * FROM Friend;
-- Example: Selecting properties from Movie vertices SELECT title, release_date, genre FROM Movie;
-- Example: Filtering Actors by age SELECT name, age FROM Actor WHERE age > 40; ```
Note: Under the hood, TigerGraph does not support SQL, but provides JDBC/ODBC connectors or REST endpoints that enable SQL-like querying for external tools.
Datatype Conversion between TigerGraph and Snowflake
When loading data from TigerGraph into environments such as Snowflake, expect datatype conversion. Some property types in TigerGraph may not have exact equivalents in Snowflake. For example:
| TigerGraph Type | Closest Snowflake Type |
|---|---|
| INT | NUMBER |
| DOUBLE | FLOAT |
| STRING | VARCHAR |
| DATETIME | TIMESTAMP_NTZ/TZ/LTZ |
| BOOL | BOOLEAN |
Best practice: Review column mapping and datatype conversions to ensure integrity during ETL jobs.
Initial Load vs. Incremental Load Patterns
A robust pattern to import data efficiently is to combine an initial "full" load with ongoing incremental loads. Use the same Database Query component (e.g., in Matillion) for both, but vary your query's filter clause.
1. Initial (Full) Load
On the first load, the Database Query component runs without a filter clause, extracting all relevant data:
-- Initial load: Select ALL records SELECT * FROM Person;
2. Incremental Loads
For subsequent imports, apply a filter to select only new or updated records, often based on a timestamp or version field:
-- Incremental load: Select only updated or new records since last load SELECT * FROM Person WHERE last_modified > '2024-05-01 00:00:00';
Adjust the timestamp based on your last successful incremental sync.
For detailed strategies on incremental load implementation patterns, see Matillion Exchange: Incremental Load Data Replication Strategy.
By following this approach, you efficiently manage data extraction from TigerGraph to downstream systems (e.g., Snowflake), supporting both bulk onboarding and ongoing synchronization while maintaining query performance and data consistency.
Data Integration Architecture
A key advantage of the ELT (Extract, Load, Transform) architecture is its ability to simplify the data integration process by separating it into distinct steps: loading and then transforming data. By loading raw data into the Snowflake database before integration, organizations can “divide and conquer” the overall workflow, making it easier to manage and troubleshoot each stage independently. For effective data integration, it is essential to transform the loaded data, and this is best accomplished using data transformation pipelines, which automate and orchestrate the necessary transformations reliably and repeatedly. Another important benefit of the ELT approach is that all data transformation and integration processes take place directly within the target Snowflake environment. This enables fast, on-demand, and highly scalable operations by leveraging Snowflake’s compute capabilities, and it eliminates the need for—and associated costs of—external data processing infrastructure.