Videos

Using the OData Query component in Matillion ETL for Google BigQuery

Using the OData Query component in Matilltion ETL for Google BigQueryMatillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.

The OData Query component in Matillion ETL for Google BigQuery presents an easy-to-use graphical interface that lets you connect to an OData service and extract data into a table in your Google BigQuery Database. The component supports OData versions 2.0, 3.0 and 4.0.

Video

Watch our tutorial video for a demonstration on how to set up and use the OData Query component in Matillion ETL for Google BigQuery.

 

 

What is OData?

Most software products have a REST API (of some kind) to interact-with and if possible extract data from it. Each vendor tends to have their own implementation. Therefore, it could be a tedious exercise to learn the specifics of each before you can build processes to interact with them (CRUD).

OData standard was proposed by microsoft in 2007 to help standardise the way in which RESTful services could be built.

OData (Open Data Protocol) is a REST-based protocol for querying and updating data and is built on standardized technologies such as HTTP, Atom/XML, and JSON. It is different from other REST-based web services in that it provides a uniform way to describe both the data and the data model.

OData RESTful APIs are easy to consume. The OData metadata, a machine-readable description of the data model of the APIs, enables the creation of powerful generic client proxies and tools.

You can read more about OData on the OData website.  Additionally, visit the OData API Explorer to see how a tool can utilise the standards based approach to query, interact and retrieve data from an OData service using RESTful calls over HTTP/S.

The OData Query component from Matillion lets you connect to an OData service and extract data into a table in your Google BigQuery. The component supports OData versions 2.0, 3.0 and 4.0.

Data Extraction

As with other load components in Matillion, the OData component extracts data to a Staging bucket in Google Cloud Storage and then loads it into BigQuery.

 

Odata Query component in Matilltion ETL for Google BigQuery Architecture

 

Here’s what happens –

  1. OData Query component issues a REST call to the OData Service URL
  2. OData Service returns the requested data
  3. Matillion parses the data and streams it to a preconfigured Cloud Storage bucket
  4. Once all data is in a Cloud Storage Bucket, a LOAD command is issued to Google BigQuery
  5. BigQuery Loads data from Cloud Storage Bucket

The files in Cloud Storage Bucket are automatically deleted after the load operation.

Authentication

The component supports various authentication mechanisms. You can find a complete list of supported Auth Scheme methods on the support portal. The default is NONE but you may change it as necessary based on your requirements. The most common are BASIC and OAuth.

To use BASIC authentication, set the Username and Password options on the component properties thus enabling Matillion to automatically set the AuthScheme property to BASIC.

Using the OData Query component in Matillion ETL for Google BigQuery

OData Query component is an Orchestration component and can be used in an Orchestration Job.

 

Using the Odata Query component in Matilltion ETL for Google BigQuery
I will use the Northwind OData test service for our example –

  • Start by adding the component to an orchestration job.
  • Set authentication as appropriate. Northwind service does not require authentication so ignore username and password.
  • Set the URL to the OData service endpoint.
  • Select the Data Source. Matillion will query the OData service and present you with a list of Tables, Views and Navigational Properties. Select one and Click OK.
  • Select the Columns you need under Data Selection. Again, Matillion will present you with a list of columns it identified by analysing the metadata from OData service.
  • Set the Target Table.
  • Set Cloud Storage Staging Area.

Here’s what it may look like –

 

Odata Query component in Matilltion ETL for Google BigQuery Example

 

The component is destructive, it drops and recreates the Target table. To avoid losing data in the next run, use a Transformation job to move the data away from Target table into another more persistent table.

 

Odata Query component in Matilltion ETL for Google BigQuery Transformation

 

The image above shows two OData components pulling data into staging tables and then we use a Transformation job to combine these two sources and populate another more persistent table in Google BigQuery.

 

Odata Query component in Matilltion ETL for Google BigQuery Transformation 2

 

Connection Options

Use the Connection Options section to describe the various options you can use when establishing a connection to control the behaviour of the service and the data it returns.

Read about the available Connection Options on our support portal.

Navigation Properties

The Connection Option NavigationPropertiesAsViews is useful for accessing data in OData services that lack foreign key references.

This property can be useful for OData services that can return related collections of entities, or navigation properties. Some OData entities, however, you can only access through navigation properties. NavigationPropertiesAsViews will cause all of the discovered navigation properties to be added as views.

This connection option defaults to true, and we list available Navigation-properties as views in the Data source list in the format ParentTable_NavigationProperty. You may set this to false to ignore navigational properties and display just tables and views.

Retrieving Data from Limited OData APIs

In most cases, NavigationPropertiesAsViews should stay on and the resulting views so you can access it like any other table or view. However, some OData APIs have limitations that require you to specify the primary key of the parent record when querying a navigation property. For example:

 SELECT * FROM Categories_Products WHERE Categories_CategoryId='1'

 

You will also need to set SupportsExpand to false. You can find more information on this API limitation in the documentation for the property.

Conclusion

In conclusion, we hope that gave you a good overview of the OData Query component in Matillion ETL for Google BigQuery. Please refer to the next section for links to some additional content.

Useful Links

OData Query component in Matillion ETL for Google BigQuery
OData Query Component
OData Component Connection Options
Integration information
Video

Want to try out the OData Query component in Matillion ETL for BigQuery? Request a free demo now, or launch on the Google Cloud Launcher.