Connecting Matillion ETL to Google BigQuery: A Guide
In Matillion ETL, the metadata for connecting to Google BigQuery is held in an artifact known as an Environment. Matillion ETL Environments can also hold additional information that is used during data extraction and loading.
Prerequisites for connecting to Google BigQuery
The prerequisites for connecting Matillion ETL to Google BigQuery are:
- Access to Matillion ETL for Google BigQuery
- Permission to create and edit Environments in Matillion ETL
- Your Matillion ETL instance is authorized via a service account with at least the following roles granted:
- Google Cloud Platform (GCP) service details:
- A GCP project with billing enabled
- A Google BigQuery dataset within the chosen project
You must enable APIs for the GCP project, including at least the following:
- The BigQuery API
- The Identity and Access Management (IAM) API
To check which APIs have been enabled, follow APIs and services from your GCP console like this:
Google BigQuery identifiers
All Google Cloud services – including Google BigQuery – are managed within the context of a Google Cloud project. You can find your project name from the dropdown list at the top of the GCP console as shown below. The dropdown lists all the projects you can access.
Every project has two alternative unique identifiers:
- Project ID – alphanumeric, you can influence the value when creating a project
- Project Number – long integer, generated automatically
To list all your project IDs and numbers you can use Google Cloud Shell. Launch it from the GCP console by following the Activate Cloud Shell link at the top right of the screen.
Once your cloud shell terminal becomes active, run a gcloud projects list command. You will find all the project IDs listed, along with their project number:
To list the datasets within your currently selected project, you can run a bq ls command from the cloud shell.
By default only the first 50 datasets are listed. If you have more than that, you can raise the limit by adding for example –max_results 500.
How to connect a Matillion ETL Environment to Google BigQuery
To create or edit a Matillion ETL Environment, start from the main Project menu, or from the context menu in the Environments panel.
This action will pop up a Create or an Edit Environment window.
In this dialog, choose a name for the Environment, and optionally set any AWS, Azure, or GCP credentials that you intend to use.
Then choose your Default Project and Default Dataset from the dropdown lists. While you are completing this information, make certain of two things:
- The dropdown lists for the Default Project and Default Dataset both show values that you can select
- After selecting the values, you get a success message when you press the Test button
If the dropdown lists are empty, you will need to re-check the prerequisites. It is likely that either your GCP project does not have the necessary APIs enabled, or that the service account is insufficiently privileged.
Press the Test button and make sure you see a success message before finishing the Environment Setup dialog
Connectivity metamodel for Matillion ETL and Google BigQuery
Matillion ETL stores Google BigQuery connection details inside an Environment object. The Environment contains the project ID and the dataset ID.
Environments can optionally also hold additional cloud provider information, in the form of AWS, Azure, and GCP credentials. This information is used by various orchestration components such as the Data Transfer.
Matillion ETL Environments exist within a Project and a Group. Orchestration and Transformation Jobs are versioned within Matillion ETL projects. Jobs always execute within the context of exactly one Environment.
For Google, the core administrative entity is the account, also known as an organization. You create projects within an account, and they are allocated a globally unique project ID. All project contents are completely isolated from each other by default.
You create Principals – such as Service Accounts – within projects. Service Accounts can be granted Roles, such as the ability to use Google BigQuery. Matillion uses Service Accounts to obtain GCP privileges at runtime.
You can enable individual APIs – such as the BigQuery API – for projects. Datasets can be created within a Project provided the BigQuery API has been enabled, and that billing has been switched on for the project.
Once you’ve connected Matillion ETL and Google BigQuery
You can start to build workflows in Matillion ETL
Learn more about using Matillion ETL with Google BigQuery
If you want to read more about how Matillion ETL and Google BigQuery work together in the cloud, download our ebook, Optimizing BigQuery: A Real-World Guide.
10 Best Practices for Maintaining Data Pipelines
Mastering Data Pipeline Maintenance: A Comprehensive GuideBeyond ...News
Matillion Adds AI Power to Pipelines with Amazon Bedrock
Data Productivity Cloud adds Amazon Bedrock to no-code generative ...Blog
Data Mesh vs. Data Fabric: Which Approach Is Right for Your Organization? Part 3
In our recent exploration, we've thoroughly analyzed two key ...