Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.
The OData Query component in Matillion ETL for Amazon Redshift presents an easy-to-use graphical interface that lets you connect to an OData service and extract data into a table in your Amazon Redshift Database. The component supports OData versions 2.0, 3.0 and 4.0.
Watch our tutorial video for a demonstration on how to set up and use the OData Query component in Matillion ETL for Amazon Redshift.
What is OData?
Most software products have a REST API of some kind, to interact with and if possible extract data from it. Each vendor tends to have their own implementation so it could be a tedious exercise to learn the specifics of each before you can build processes to interact with them (CRUD).
OData standard was proposed by Microsoft in 2007 to help standardize the way in which RESTful services could be built.
OData (Open Data Protocol) is a REST-based protocol for querying and updating data and is built on standardized technologies such as HTTP, Atom/XML, and JSON. It is different from other REST-based web services in that it provides a uniform way to describe both the data and the data model.
OData RESTful APIs are easy to consume. The OData metadata, a machine-readable description of the data model of the APIs, enables the creation of powerful generic client proxies and tools.
You can read more about OData on the OData website. Additionally, visit the OData API Explorer to see how a tool can utilise the standards based approach to query, interact and retrieve data from an OData service using RESTful calls over HTTP/S.
As with other load components in Matillion, the OData component extracts data to a Staging bucket in S3 and then loads it into Amazon Redshift.
Here’s what happens –
- OData Query component issues a REST call to the OData Service URL
- OData Service returns the requested data
- Matillion parses the data and streams it to a preconfigured S3 bucket
- Once all data is in S3, a COPY command is issued to Amazon Redshift
- Amazno Redshift Loads data from S3
The files in S3 are automatically deleted after the load operation.
The component supports various authentication mechanisms. A complete list of supported Auth Scheme methods can be found on the support portal. The default is NONE but you may change it as necessary based on your requirements. The most common are BASIC and OAuth.
To use BASIC authentication, set the Username and Password options on the component properties thus enabling Matillion to automatically set the AuthScheme property to BASIC.
Using the OData Query component in Matillion ETL for Amazon Redshift
OData Query component is an Orchestration component and can be used in an Orchestration Job.
I will use the Northwind OData test service for our example –
- Start by adding the component to an orchestration job.
- Set authentication as appropriate. Northwind service does not require authentication so ignore username and password.
- Set the URL to the OData service endpoint.
- Select the Data Source. Matillion will query the OData service and present you with a list of Tables.
- Views and Navigational Properties. Select one and Click OK.
- Select the Columns you need under Data Selection. Again, Matillion will present you with a list of columns it identified by analysing the metadata from OData service.
- Set the Target Table.
- Set S3 Staging Area.
Here’s what it may look like –
The component is destructive meaning it drops and recreates the Target table. To avoid losing data in the next run, use a Transformation job to move the data away from Target table into another more persistent table.
The image above shows two OData components pulling data into staging tables and then we use a Transformation job to combine these two sources and populate another more persistent table in Amazon Redshift.
Use the Connection Options section to describe the various options you can use when establishing a connection to control the behaviour of the service and the data it returns.
Read about the available Connection Options on our support portal.
The Connection Option NavigationPropertiesAsViews is useful for accessing data in OData services that lack foreign key references.
This property can be useful for OData services that can return related collections of entities, or navigation properties. Some OData entities, however, you can only access through navigation properties. NavigationPropertiesAsViews will cause all of the discovered navigation properties to be added as views.
This connection option defaults to true, and we list available Navigation-properties as views in the Data source list in the format ParentTable_NavigationProperty. You may set this to false to ignore navigational properties and display just tables and views.
Retrieving Data from Limited OData APIs
In most cases, NavigationPropertiesAsViews should stay on and the resulting views so you can access it like any other table or view. However, some OData APIs have limitations that require you to specify the primary key of the parent record when querying a navigation property. For example:
SELECT * FROM Categories_Products WHERE Categories_CategoryId='1'
You will also need to set SupportsExpand to false. You can find more information on this API limitation in the documentation for the property.
In conclusion, we hope that gave you a good overview of the OData Query component in Matillion ETL for Amazon Redshift. Please refer to the next section for links to some additional content.
[callout-content title=”Want to try the OData Query component in Matillion ETL for Amazon Redshift? Arrange a free 1-hour training session now, or start a free 14-day trial.”]
[button title=”Get Free Training Session” link=”http://info.matillion.com/redshift-etl-free-training-offer”]
[button title=”Get 14-day Free Trial” link=”./etl-for-redshift/free-trial/” style=”ghost”]