- Blog
- 09.14.2017
- Data Fundamentals, Dev Dialogues
Using the Twitter Query Component in Matillion ETL for Amazon Redshift

Video
Watch our tutorial video for a demonstration on how to set up and use the Twitter Query Component in Matillion ETL for Amazon Redshift. https://youtu.be/dmwnQSdmIeUData extraction architecture
Similar to most of Matillion’s Load/Unload components, acquiring data from Twitter is a two stage process.
Data Orchestration
The Twitter Query is a data Orchestration component. To use the Twitter Query component you need to create an Orchestration job and edit it. Locate the Twitter Query in the component list and drag it onto the Orchestration job canvas to edit it.
- Authentication - set this to the name of the OAuth credentials you authenticated in Step 1 above
- Data Source - choose a table or view from those available in the Twitter data model. There are approximately 20 main elements in the Twitter data model, including Tweets, Direct Messages, Followers and List
- Data Selection - choose the columns depending on which Data Source you’re using
- Data Source Filter - filters are normally used to restrict the data of interest
- Target Table - choose a name for the new Amazon Redshift table. Note this is a “staging” table, so you’ll need to move the data on after loading (see the next section)
- S3 Staging Area - choose one of your existing Amazon S3 buckets. It will be used temporarily during the bulk load (stage 2, as mentioned above)

Running the Extract and Load
Matillion offers various ways to run Orchestration jobs, including its own built-in scheduler and an integration with Amazon SQS. During testing and development, however, you’ll probably want to run it interactively with a right-click on the canvas.
Staging and Transformations
The Target Table property you chose is the name of a Amazon Redshift “staging” table. That means it will be recreated every time the Twitter Query component runs. If the table already exists, it will be silently dropped first. It’s deliberately designed this way so you can do incremental loads and take advantage of Amazon Redshift’s fast bulk loader. However, it does mean that you need to copy the newly-loaded data into a more permanent location after every load. The usual pattern is to call a new Transformation job immediately after the Load.

Useful Links
Twitter Query Component in Matillion ETL for Amazon Redshift Component Data Model OAuth Set Up Integration information VideoBegin your data journey
Want to try the Twitter Query component in Matillion ETL for Amazon Redshift? Arrange a free 1-hour training session now, or start a free 14-day trial.
Ian Funnell
Manager of Developer Relations
Featured Resources
Blog
Matillioners using Matillion: Alice Tilles' Journey with Matillion & ThoughtSpot
In the constantly evolving landscape of data analytics, ...
BlogWhat’s New to Data Productivity Cloud?
In July of this year, Matillion introduced the Data Productivity ...
eBooks10 Best Practices for Maintaining Data Pipelines
Mastering Data Pipeline Maintenance: A Comprehensive GuideBeyond ...
Share: