Integrate data from Google Firestore to Snowflake using Matillion

Our Google Firestore to Snowflake connector seamlessly transfers your data to Snowflake within minutes, ensuring it remains current without requiring manual coding or complex ETL scripting.

Google Firestore
Snowflake
Google Firestore to Snowflake banner

Extracting data from Google Firestore to Snowflake

Extracting data from Google Firestore is a valuable process for organizations seeking to leverage their operational data for analytics and reporting. One common destination for such data is Snowflake, a cloud data warehouse known for its scalability and performance. However, moving data between these platforms involves a number of important considerations. In this article, we will guide you through the essential steps for extracting data from Google Firestore and loading it into Snowflake. We will begin by explaining how to create an appropriate identity within Google Firestore to facilitate secure access. For those using Matillion—an ETL tool popular for its integration capabilities—we will cover how to check for the necessary JDBC driver or obtain it if required. Additionally, we will discuss the best practices for network connectivity between your source and target systems to ensure a smooth data transfer. Finally, we will explore how to query data from Firestore, covering both the initial full extraction and strategies for incremental data loading, enabling you to keep your Snowflake data warehouse up to date efficiently and reliably.


What is Google Firestore?

Google Firestore is a fully managed, scalable NoSQL cloud database offered as part of Google Cloud Platform and Firebase. Designed for complex, real-time applications, Firestore stores data in flexible, hierarchical collections of documents, enabling structured queries, dynamic indexing, and robust support for offline operations. It offers strong consistency guarantees and multi-region replication for high availability and durability. Firestore integrates seamlessly with Google’s authentication and security services, supports ACID transactions, and provides SDKs for mobile, web, and server environments. This makes it a suitable choice for applications that demand real-time synchronization and rapid development without the overhead of managing database infrastructure.

matillion logo x Google Firestore

What is Snowflake?

Snowflake is a cloud-native data platform for high-performance data warehousing, secure sharing, and analytics. It separates storage and compute, allowing independent scaling and cost optimization based on workload. Running on AWS, Azure, and Google Cloud, Snowflake offers cross-cloud compatibility and high availability. Supporting structured and semi-structured data (JSON, Avro, Parquet), it enables flexible ingestion and SQL-based querying. Features like automatic clustering, zero-copy cloning, and data time travel streamline data management and development. Popular among data engineers and analysts, Snowflake delivers agility and strong performance for modern data applications.

Why Move Data from Google Firestore into Snowflake

Unlocking Analytics: The Value of Replicating Google Firestore Data into Snowflake

Google Firestore often stores data that is potentially valuable for analytics, such as customer interactions or application events. However, this value is fully realized when data from Firestore is combined with information from other sources, enabling comprehensive reporting, richer insights, and improved decision-making. By copying data from Google Firestore into Snowflake, data engineers and architects can seamlessly integrate and analyze diverse datasets without adding significant load to Firestore itself. Snowflake’s scalable architecture is purpose-built for analytical workloads, ensuring that intensive queries and complex data transformations do not impact the performance or reliability of operational systems like Firestore. This approach helps organizations maximize the utility of their data while maintaining efficient and stable operations.

Creating an Identity in Firestore

Google Firestore is a NoSQL document-oriented database and does not support traditional SQL statements such as

CREATE USER
. Instead, to "create a user," you typically add a user document with relevant fields in a
users
collection. The following instructions assume you are using Node.js and the official Firebase Admin SDK to interact with Firestore, but the logic applies similarly to other environments.

Prerequisites

  • Node.js environment
  • Firebase project and service account credentials (instructions)
  • Firebase Admin SDK installed:
    npm install firebase-admin

Steps to Create a User Document

  1. Initialize the Admin SDK

```js const admin = require('firebase-admin');

admin.initializeApp({
  credential: admin.credential.applicationDefault()
});

const db = admin.firestore();

```

  1. Define the User Data

    You can define any schema your application requires. Here is an example:

js
    const userData = {
      uid: "unique-user-id",
      email: "[email protected]",
      displayName: "John Doe",
      createdAt: admin.firestore.FieldValue.serverTimestamp(),
    };

  1. Add the User to the

    users
    Collection

    Here is how to create or overwrite a user document:

```js async function createUser(userData) { const { uid, ...rest } = userData; await db.collection('users').doc(uid).set(rest); console.log("User created with UID:", uid); }

createUser(userData).catch(console.error);

```

- This will create a new document with the ID set to the user's UID under the `users` collection.
- If a document with the same UID already exists, it will be overwritten with the new data.

Example Equivalent in Firestore Console

If you wish, you may manually create a user document via the Firestore Console:

  • Navigate to Cloud Firestore in the Firebase Console.
  • Select the users collection (or create it if it does not exist).
  • Click Add document.
  • Enter a unique user ID as the Document ID, and add fields such as
    email
    ,
    displayName
    , etc.
  • Click Save.

Firestore vs. SQL Analogy

While you might be accustomed to using an SQL script to create a user, e.g.,

INSERT INTO users (uid, email, displayName, createdAt)
VALUES ('unique-user-id', '[email protected]', 'John Doe', NOW());

In Firestore, the process involves adding a document rather than executing an SQL statement. The outcome—adding an identity record into your user table/collection—is functionally equivalent.


Note: User authentication is handled via Firebase Authentication. Creating a user in Firestore does not grant authentication capabilities. To create a Firebase Authentication user, use the Admin SDK’s

createUser()
method and link the user metadata as appropriate in the Firestore collection.

Installing the JDBC driver

At the time of writing, the JDBC driver for Google Firestore is not included by default in Matillion Data Productivity Cloud. This is due to licensing or redistribution restrictions. As a result, you must download and install the driver manually before you can use Firestore as a source or target in your Matillion workflows.

Follow the steps below to obtain and install the appropriate Firestore JDBC driver and make it available in Matillion Data Productivity Cloud.

1. Download the Firestore JDBC Driver

  1. Go to the official Simba JDBC and ODBC Driver download page for Firestore:
    https://www.simba.com/drivers/firestore-jdbc-odbc/
  2. Locate the section for the JDBC driver for Google Firestore.
  3. Ensure you download the Type 4 JDBC driver package, as this does not require any native dependencies and is suitable for cloud-based environments like Matillion.

Note: You may be required to register or accept terms and conditions before accessing the download.

2. Upload the Driver to Matillion Data Productivity Cloud

Since Matillion does not bundle the Firestore JDBC driver, you will need to upload it manually. Please follow the official Matillion instructions at:
https://docs.matillion.com/data-productivity-cloud/agent/docs/uploading-external-drivers/

The process typically involves:

  • Accessing your Matillion Data Productivity Cloud environment.
  • Navigating to the locations or interfaces where you can manage external JDBC drivers.
  • Uploading the Firestore driver JAR file that you downloaded in the previous step.
  • Ensuring the driver is enabled and available for use by the relevant connections or agents.

Refer closely to the provided URL for step-by-step guidance specific to your deployment and version.

3. Configure and Use the Firestore JDBC Driver in Matillion

Once the driver is uploaded and registered, you can begin using it within your Matillion workflows to connect to Firestore. For details on how to create database connections, author database queries, and leverage the new driver within the Matillion environment, refer to: https://docs.matillion.com/data-productivity-cloud/designer/docs/database-query/

Make sure to use the connection dialogs and settings appropriate for your Firestore environment, and select the desirable driver from your list of installed JDBC drivers.

Checking network connectivity

To ensure successful network connectivity between Matillion Data Productivity Cloud and your Google Firestore database, you must verify that your database will accept incoming connections according to your deployment type:

  • Full SaaS Agent Configuration:
    You must allow incoming connections from the IP addresses used by Matillion's Full SaaS platform. The current list of necessary IP addresses is maintained at this URL. Please ensure all listed IP addresses have access to your Google Firestore database.

  • Hybrid SaaS Deployment:
    In a Hybrid SaaS setup, your database must allow incoming connections from your own virtual private cloud (VPC), where the Hybrid SaaS agent is running. To assist with configuration and troubleshooting, refer to the utilities available at this page.

Additionally, if your Google Firestore database is referenced with a DNS (host name), you must ensure that the Matillion agent (whether Full SaaS or Hybrid SaaS) is able to resolve the address correctly. This may require appropriate configuration of your network’s DNS resolution for the agent’s environment.

Querying Data from a Google Firestore Database

When working with Google Firestore and exporting its data for analytics or warehousing (for example, to Snowflake), you need to express your queries and understand how data types might transform across the systems. This guide provides examples and outlines the recommended approach for initial and incremental data loads using Firestore.


1. Example Firestore Queries Expressed as SQL SELECT Statements

Although Google Firestore does not use SQL natively, it's helpful to conceptualize Firestore queries with SQL-equivalent

SELECT
statements when planning data movements and transformations.

Retrieve All Documents From a Collection

Firestore query:

js
db.collection('users').get()

Equivalent SQL statement:

SELECT * FROM users;


Retrieve Documents with a Conditional Filter

Firestore query:

js
db.collection('orders').where('status', '==', 'pending').get()

Equivalent SQL statement:

SELECT * FROM orders WHERE status = 'pending';


Retrieve Documents and Order by a Field

Firestore query:

js
db.collection('products').orderBy('created_at', 'desc').get()

Equivalent SQL statement:

SELECT * FROM products ORDER BY created_at DESC;


2. Datatype Conversion Between Google Firestore and Snowflake

Firestore stores data in a flexible, NoSQL format, whereas Snowflake expects strict SQL datatypes. Common conversion considerations include:

Firestore Type Snowflake Datatype
string VARCHAR
number NUMBER, FLOAT
boolean BOOLEAN
timestamp TIMESTAMP_NTZ
array VARIANT
map/object VARIANT (JSON)

You may need to explicitly cast or transform fields during ingestion, especially for nested objects and arrays.


3. Initial and Incremental Load Pattern

For robust and efficient data replication, use a two-step load strategy:

  • Initial Load: Full extraction and loading of all records into your data warehouse.
  • Incremental Load: Only load new or changed records since the last successful load (commonly using a timestamp or an incrementing ID).

Use the same Database Query component for both operations, simply varying the filter condition.

Initial Load Example

No filter clause — fetches ALL records:

SELECT * FROM users;
(Firestore:
db.collection('users').get()
)


Incremental Load Example

With a filter clause, often by a timestamp:

Assume you are tracking the last successful sync timestamp as

:last_sync_timestamp
.

SELECT * FROM users WHERE updated_at > :last_sync_timestamp;
(Firestore:
db.collection('users').where('updated_at', '>', lastSyncTimestamp).get()
)

See Matillion documentation on incremental load strategies for advanced use cases.


Note: Matillion’s Database Query component makes it easy to use the same transformation logic for both initial and incremental pipelines—a recommended best practice.

Data Integration Architecture

Loading data in advance of integration is a core feature of the ELT (Extract, Load, Transform) architecture, and represents a divide-and-conquer strategy that separates data loading from data integration and transformation. By first loading raw data into Snowflake, organizations can independently manage the ingestion and subsequent transformation stages, addressing each challenge more effectively. Data integration invariably involves transformation operations such as data cleansing, normalization, and aggregation; these are most effectively accomplished through dedicated data transformation pipelines. Another notable advantage of the ELT approach is that both data transformation and integration processes occur natively within the Snowflake database. This native execution means transformations are not only fast and scalable—leveraging Snowflake's elastic computing resources on demand—but also eliminates the need for separate data processing infrastructure, resulting in a streamlined architecture and significant cost savings.

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.