Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesizing DNA.
The Google Analytics Query component presents an easy-to-use graphical interface, enabling you to connect to live Google Analytics Data. Many of our customers are using this service for example to track customer’s journey through the website, to monitor site usage and to monitor website speeds. The component allows you to bring the Google Analytics data into Amazon Redshift for analysis and integration with in-house data sets.
The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.
Before configuring the Google Analytics Query component, you will need to configure OAuth to allow Matillion to access your Google Analytics data. This is done in the Project -> Manage OAuth menu.
Further details are available here. Then back in the Google Analytics Query component, choose the Google Analytics OAuth profile in the Authentication property.
The next step is to choose the data to be loaded into Amazon Redshift in the Data Source drop down. The Google Analytics API has 13 data sources to choose from. Select the relevant one from the drop down.
After choosing the data source, the next step is to choose the required fields from the data source in the Data Selection. This will form the new table which is created in Amazon Redshift.
Please note that the Google API allows a maximum of 7 dimensions and 10 metrics to be selected in one API call, because of this limit the data sources do not behave like traditional tables and will not return everything when queried.
There is one mandatory connection option which must be specified in order for the Google Analytics Query component to work. The Profile defines which part of the website or app you can pull the analytics data from. This can be found in the Google Analytics Console in your web browser and is set in the Connection Options in the component properties.
Running the Google Analytics Query
Before the component can be run, a Target Table name needs to be given. This is the name of a new table which will be created to write the data into in Amazon Redshift. Also an S3 Staging Area must be specified, this is an S3 bucket which is used to temporarily store the results of the query before it is loaded into Amazon Redshift.
This component also has a Limit property, defaulting to 100, which can be used to force an upper limit on the number of records returned.
You can run the Orchestration job, either manually or using the Scheduler, to query your data from the Google Analytics API, and bring it into Amazon Redshift.
By default the Google Analytics API will return data for the last 30 days. Start Date and End date filters can be used to change the date range of the data. These can be set using the Data Source Filter component property.
The Google Analytics Query component offers an “Advanced” mode instead of the default “Basic” mode. In Advanced mode, you can write a SQL-like query over all the available fields in the data model. This is automatically translated into the correct Google Analytics API calls to retrieve the data requested.
Custom dimension and metrics
The Google Analytics API has the functionality to include a dimension or metric in the data source which is not visible in the Data Selection options. This is done by selecting the Dimensions and/or Metrics fields in the Data Selection properties.
These values then need to be set in the Connection Options property:
The dimension values for keyword and region will now appear in a comma separated list with the column name ‘Dimensions’ in the Redshift table.
Transforming the Data
Once the necessary data has been brought in from Google Analytics into Amazon Redshift, it can then be used in a Transformation job.
In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of Amazon Redshift’s power and scalability.