Bringing Google Analytics data into Snowflake using Matillion ETL for Snowflake

Bringing Google Analytics data into Snowflake using Matillion ETL for Snowflake

Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesizing DNA.

The Google Analytics Query component in Matillion ETL for Snowflake presents an easy-to-use graphical interface, enabling you to connect to live Google Analytics Data. Many of our customers are using this service for example, to track customers journey through the website, to monitor site usage and to monitor website speeds. The component allows you to bring the Google Analytics data into Snowflake for analysis and integration with in-house datasets.

The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion ETL for Snowflake license, so there is no additional cost for using the features.

See the Google Analytics Query component in action

Check out the video tutorial below to see the Google Analytics Query component in Matillion ETL for Snowflake in action.



Before configuring the Google Analytics Query component, you will need to configure OAuth to allow Matillion ETL for Snowflake to access your Google Analytics data. This is done in the Project -> Manage OAuth menu. Further details are available here. Then back in the Google Analytics Query component, choose the Google Analytics OAuth profile in the Authentication property.




Data Source

The next step is to choose the data to be loaded into Snowflake in the Data Source drop down. The Google Analytics API has 13 data sources to choose from. Select the relevant one from the drop down menu.




After choosing the data source, the next step is to choose the required fields from the data source in the data selection. This will form the new table which is created in Snowflake.




Please note that the Google API allows a maximum of 7 dimensions and 10 metrics to be selected in one API call, because of this limit the data sources do not behave like traditional tables and will not return everything when queried.

Connection Options

There is one mandatory connection option which must be specified for the Google Analytics Query component. The Profile defines which part of the website or app you can pull the analytics data from. This can be found in the Google Analytics Console in your web browser and is set in the Connection Options in the component properties.




Running the Google Analytics Query

Before the component can be run, a Target Table name needs to be given. This is the name of a new table which will be created to write the data into in Snowflake. Also a S3 Staging Area must be specified, this is a S3 bucket which is used to temporarily store the results of the query before it is loaded into Snowflake.

This component also has a Limit property, defaulting to 100, which can be used to force an upper limit on the number of records returned.

You can run the Orchestration job, either manually or using the Scheduler, to query your data from the Google Analytics API, and bring it into Snowflake.




Date Filters

By default the Google Analytics API will return data for the last 30 days. Start Date and End Date filters can be used to change the date range of the data. These can be set using the Data Source Filter component property.




Advanced mode

The Google Analytics Query component in Matillion ETL for Snowflake offers an “Advanced” mode instead of the default “Basic” mode. In Advanced mode, you can write a SQL-like query over all the available fields in the data model. This is automatically translated into the correct Google Analytics API calls to retrieve the data requested.




Custom dimension and metrics

The Google Analytics API has the functionality to include a dimension or metric in the data source which is not visible in the Data Selection options. This is done by selecting the Dimensions and/or Metrics fields in the Data Selection properties.




These values then need to be set in the Connection Options property:




The dimension values for keyword and region will now appear in a comma separated list with the column name ‘Dimensions’ in the Snowflake table.

Transforming the Data

Once the necessary data has been brought in from Google Analytics into Snowflake, it can then be used in a Transformation job.




In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of Snowflake’s power and scalability.


Want to try the Google Analytics Query component in Matillion ETL for Snowflake? Arrange a free 1-hour training session now, or start a free 14-day trial.