Using the Twitter Query Component in Matillion ETL for Snowflake

Matillion's "Twitter Query" component enables you to extract data from your Twitter account, and load it into Snowflake. The Twitter Query component in Matillion ETL for Snowflake uses the ETL principle to load data from diverse sources into one Snowflake database. From there, you can take advantage of Snowflake’s power and scalability to transform, join and aggregate the data into shape for reporting or analytics.

Video

Watch our tutorial video for a demonstration on how to set up and use the Twitter Query Component in Matillion ETL for Snowflake. https://youtu.be/u0qf5nJHLAc

Data extraction architecture

Similar to most of Matillion’s Load/Unload components, acquiring data from Twitter is a two stage process. Matillion-TwitterQuery-Snowflake-Process

Matillion-TwitterQuery-Snowflake-Process

Stage 1 Matillion connects to Twitter’s API using OAuth credentials. Before using the Twitter Query component you’ll first need to set up your OAuth credentials; this is a once-only setup task. Once authenticated, Matillion brings the data from Twitter across the network. Stage 2 Matillion instructs Snowflake to bulk-load the extracted data. This is the fastest way to load the Twitter data into Snowflake from end to end.

Data Orchestration

The Twitter Query is a data Orchestration component. To use the Twitter Query component you need to create an Orchestration job and edit it. Locate the Twitter Query in the component list and drag it onto the Orchestration job canvas to edit it.

Matillion-Amazon-Redshift-Twitter Query component-Located under Orchestration job, Load/Unload, Social Networks.

It’s always best to work down the property list from top to bottom. Some of the key setting are as follows:

Authentication - set this to the name of the OAuth credentials you intend to use
Data Source - choose a table or view from those available in the Twitter data model. There are about 20 of the main elements in the Twitter data model, including Tweets,
Direct Messages, Followers and Lists
Data Selection - choose the columns, depending on which Data Source you’re using
Data Source Filter - you should normally enter some filters to restrict the data of interest
Limit - it’s a good idea to set this to avoid rate limiting errors
S3 Staging Area - choose one of your existing S3 buckets. It will be used temporarily during the bulk load (stage 2, as mentioned above)
Target Table - choose a name for the new Snowflake table. Note this is a “staging” table, so you’ll need to move the data on after loading (see the next section)

After all the properties have been properly configured, the border of the component should turn green. This indicates that it’s ready to run!

Matillion-Amazon-Redshift-Twitter Query component-border will turn green when correctly configured

Running the Extract and Load with the Twitter Query component in Matillion ETL for Snowflake

Matillion offers various ways to run Orchestration jobs, including its own built-in scheduler, which you can use to run jobs automatically. There is also an integration with SQS allowing you to synchronize Matillion jobs with an external scheduler. During testing and development, however, you’ll probably just want to run it interactively with a right-click on the canvas.

Matillion-Amazon-Redshift-Twitter Query component-Right click to run job

The actual run-time will vary according to the complexity of the query, the amount of data returned, and whether Twitter apply rate limiting. Once it’s working, you can add this Orchestration job into your daily schedule, and the new Snowflake table will be dropped and recreated every day, containing all the newly matching data.

Staging and Transformations

The Target Table property you chose is the name of a Snowflake “staging” table. Hence, it will be recreated every time the Twitter Query component runs. If the table already exists, it will be silently dropped first. It’s deliberately designed this way, so you can do incremental loads and take advantage of Snowflake’s fast bulk loader. However, it does mean that you need to copy the newly-loaded data into a more permanent location after every load. The usual pattern is to call a new Transformation job immediately after the Load.

Matillion-Amazon-Redshift-Twitter Query component-Load data and execute transformation job

Data returned from an API will very likely not conform to the required schema. To cleanse the data, introduce transformations, such as fixing datatypes and removing duplicates. Matillion’s Data Quality Framework document provides a detailed guide. A typical example would be to extract additional information from existing columns, using a Calculator component.

In the above example, the newly-loaded staging data has a meaningful Twitter “ID” column containing two pieces of information: a timestamp and an ordinal sequence number starting from 1. Matillion-Calculator-Data

To extract the date part, use a SUBSTRING in the Calculator expression: Matillion-Calculator-Substring

To extract the ordinal number, take advantage of Snowflake’s sophisticated regular expression handling in a second Calculator expression: Matillion-TrendOrdinal

Cleansing and enhancing the data in this way makes it ready for blending with data from other sources.

Useful Links

Twitter Query component in Matillion ETL for Snowflake Component Data Model OAuth Set Up Integration information Video

Begin your data journey

Want to try the Twitter Query component in Matillion ETL for Snowflake? Arrange a free 1-hour training session now, or start a free 14-day trial.

Get Free Training Session Get 14-day Free Trial

Ian Funnell

Data Alchemist

Ian Funnell, Data Alchemist at Matillion, curates The Data Geek weekly newsletter and manages the Matillion Exchange.

Previous Entry Next Entry

View all

Related resources

Featured Resources

Blog

View all resources

Data Productivity Cloud

Generative AI

Automation & Management

Data Connectivity

Security

Data Transformation

All Features

Connectors

Workday

Adobe Marketo

PostgresSQL

Salesforce

MySQL

Custom Connectors

By Use Case

By Cloud

By Industry

Success stories

Learn

Connect

Support

Learn

Connect

Support

Using the Twitter Query Component in Matillion ETL for Snowflake

Video

Data extraction architecture

Data Orchestration

Running the Extract and Load with the Twitter Query component in Matillion ETL for Snowflake

Staging and Transformations

Useful Links

Begin your data journey

Ian Funnell

Related resources

Featured Resources

10 Cost Optimization Tips for Snowflake

Schema Drift and Dynamic S3 Files

Best Practices For Connecting Matillion ETL To Azure Blob Storage

Generative AI

Automation & Management

Data Connectivity

Security

Data Transformation

All Features

Workday

Adobe Marketo

PostgresSQL

Salesforce

MySQL

Custom Connectors

By Use Case

By Cloud

By Industry

Success stories

Share:

Using the Twitter Query Component in Matillion ETL for Snowflake

Video

Data extraction architecture

Data Orchestration

Running the Extract and Load with the Twitter Query component in Matillion ETL for Snowflake

Staging and Transformations

Useful Links

Begin your data journey

Share:

Ian Funnell

Related resources

Featured Resources

10 Cost Optimization Tips for Snowflake

Schema Drift and Dynamic S3 Files

Best Practices For Connecting Matillion ETL To Azure Blob Storage