Videos

Using the Facebook Query component in Matillion ETL for Amazon Redshift

Facebook Query component in Matillion ETL for Amazon RedshiftFacebook is one of the major social media platforms and is a great resource for businesses of all sizes to engage with their audiences. Consequently, businesses can enhance their social footprint by creating a Facebook page and actively promoting it or create Ads using the Facebook Marketing platform to target specific audiences. In this blog post, we will look into understanding the Facebook Query component in Matillion ETL for Amazon Redshift and how to work with it.

Facebook provides various means by which businesses can measure audience engagement, advertisement performance, costs, conversions and various other metrics. There are tools (such as, Page insights, Ad Reports, etc) within the platform to access various metrics or analyse available data.

In addition to using the built-in tools, businesses can also extract data from the Facebook platform and load it into external systems. In this way businesses can analyze, mine or combine this data with data from other social/marketing platforms to derive new insights or even store it for future analysis.

Matillion’s Facebook Query component helps you pull data from Facebook into an Amazon Redshift table. Once the data is in Amazon Redshift, you may analyze using various visualization/dashboard tools, transform it to derive additional insights or combine it with data from other sources.

Video

Watch our tutorial video for a demonstration on how to set up and use the Facebook Query component in Matillion ETL for Amazon Redshift.

 

 

Data Extraction

Similarly to other load/unload components in Matillion, the Facebook component extracts data to an S3 Staging bucket and then loads it into Amazon Redshift.

 

Facebook Query component in Matillion ETL for Amazon Redshift - Data Extraction

 

Here’s what happens:

  1. Facebook Query component issues a REST call to the relevant Facebook API
  2. Facebook verifies the request and returns relevant data
  3. Matillion parses the data and streams it to a preconfigured S3 bucket
  4. Once all data is in S3, a COPY command is issued to Amazon Redshift
  5. Amazon Redshift Loads data from S3

Lastly, the files in S3 are automatically deleted after the load operation.

Authentication

The component uses OAuth to authenticate with Facebook. Here are some links to help you with the process.

  1. OAuth in Matillion
  2. Facebook 3rd party OAuth setup

If required, you may create multiple OAuth registrations each pointing to a different account in order to pull data from multiple accounts.

Please note, however, that Facebook provides mechanisms where an account owner can authorise a 3rd party Facebook account to extract data from their account. You may, however, be able to use a single account (and hence a single OAuth registration) to extract data from multiple accounts.

Facebook Graph and Marketing APIs

There are various APIs available from Facebook. The Facebook Query component focuses on extracting data from Objects exposed by Graph and Marketing APIs. Objects from these APIs are listed as data sources in the component’s data-sources property.

  • Graph API focuses on social aspects of the platform like Pages, Posts, Comments, Likes, Page Insights etc.
  • Marketing API focuses on the Ad-platform and Insights around it – AdAccounts, Campaigns, AdSets, Ads, Ad Insights, etc.

Whilst it is not necessary to understand the Graph and Marketing APIs it would be useful to understand how these work and what data is available from them.

Using the component

Facebook Query component is an Orchestration component and can be used in an Orchestration Job.

 

Facebook Query component in Matillion ETL for Amazon Redshift - Using the component

 

Start by adding the component to an orchestration job. Select the OAuth entry (authentication), the data source and the fields you’d like from the data source. Next, specify a Staging bucket (S3) and a target table. These are the minimum properties you must set to get started.

 

Facebook Query component in Matillion ETL for Amazon Redshift - Orchestration Job

 

The component is destructive in that it drops and recreates the Target table. To avoid losing data in the next run, use a Transformation job to move the data away from Target table into another more persistent table.

 

Facebook Query component in Matillion ETL for Amazon Redshift - Target Table

Data-sources and Targets
Some of the data sources require you to specify a ‘target’ because they are always owned by a parent object. An especially relevant example is is the ‘Comments’ view. Comments in Facebook are comments about a specific thing. They are always associated with a target, which is the item the comment is in response to. In this case the target could be a post, a picture, or a video.

See support documentation on Table Specific Information for an example of how a target may be specified. The example demonstrates using Advanced Mode where you can specify SQL-like queries. In Basic mode, use the ‘Data Source Filter’ to specify the target.

AuthenticateAsPage – Connection Option
Some of the data sources that are specific to a facebook-page require you to use a page-access-token to retrieve data. The driver manages this automatically if you set the AutheniticateAsPage Connection Option to the Page ID.

For example, to retrieve all photos uploaded to a page, select the Photos data source and set AuthenticateAsPage to the page-id of the page. This will return all photos uploaded to the page including the cover photo.

Facebook API Versions, Updates and APP Versions

Facebook update their APIs from time-to-time and may introduce new objects, enumerations or change behaviour of existing objects. This may affect the way the component works.

At the time of publishing this article, the Graph and Marketing APIs were at v2.10. Usually, a couple of versions are supported, which should allow existing applications to continue functioning without interruption.

See Facebook Platform Changelog for information on current version support.

All API requests to Facebook are required to specify the version number so they are serviced by the appropriate API endpoint. The Facebook Query component by default passes a version number, which typically depends on the API version at the time of release. For example, the current default is 2.10 (Matillion v1.29.9).

It is possible to query a different version of the Facebook API using the Version connection-option. This may be necessary if the Matillion Facebook component defaults to a previous version (say v2.9) of the API and you happen to create a new Facebook APP which will always default to the latest version (say v2.10) of the Graph/Marketing API. As a result, a new Facebook App will not support queries to an older version and hence you may need to use the Version connection option to query the v2.10 Facebook API endpoint.

Conclusion

We hope that gave you a good overview of the Facebook component. Please refer to the next section for links to some additional content which includes a video on using the component.

Useful Links

Facebook Query component in Matillion ETL for Amazon Redshift

Matillion OAuth Setup

Facebook Query Component

Component  Data Model

Integration information

Video

 

Want to try the Facebook Query component in Matillion ETL for Amazon Redshift? Arrange a free 1-hour training session now, or start a free 14-day trial.