How to load data from Couchbase into Amazon Redshift using Matillion ETL

How to load data from Couchbase into Amazon Redshift using Matillion ETL

couchbase matillion etl for amazon redshift

Couchbase Server, originally known as Membase, is an open-source, distributed multi-model NoSQL document-oriented database software package that is optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data.

In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value or JSON document access with low latency and high sustained throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

In this article we will be looking at how to extract data from the Couchbase server, from on-premise or Cloud to your Amazon Redshift data warehouse.

Couchbase Query Component

Matillion’s Couchbase Query Component can be used to run an SQL Query on an accessible database and copy the result to a staging table, via S3 Staging buckets. Any SQL that’s valid for the source database may be used.

Prerequisites

  • The Couchbase Server is accessible from the Matillion instance

 

Connectivity

Couchbase Server are either hosted in-house secured by a DMZ; or even within your VPC as a server on AWS. In all these cases, ensure the Matillion instance can reach the service by name or IP address.

Native Support

This Couchbase Query component retrieves data on a Couchbase server and loads it into a Redshift table. This stages the data, so the table is reloaded each time. You may then use transformations to enrich and manage the data in permanent tables.

Warning: This component is destructive as it truncates or recreates its target table on each run. Do not modify the target table structure manually.

Workflow

Below is a typical workflow for importing data from a database into Redshift

  1. Couchbase connects and executes SQL against the Couchbase Server
  2. Data is read from the Couchbase server and streamed to S3 Staging buckets (Data is never written to disk)
  3. Matillion copies the data from S3 Stage buckets to to Redshift using “Copy” command.

 

couchbase query matillion etl amazon redshift

 

Using the Couchbase Query Component

The Couchbase Component can be found under the “Load/Unload” folder in the components panel. The following image shows the properties for a configured component.

 

couchbase query matillion etl amazon redshift properties

 

The options Server, User and Password, Connection Options cover the details required to establish the connection to your Couchbase Server.

Connection Options are additional properties you can define against your connection. For example UseSSL and SSLServerCert options may be set to enforce communication over a secure channel.
In the basic mode you can provide Data source and Data Selection (column list) that will pull data from Couchbase.

In the Advanced mode you can use SQL Query that will query the database. Any variable definitions are replaced with appropriate values and the query issued against your Couchbase Server.
S3 Staging Area is a bucket where the results will be held prior to loading into Redshift.

Schema, Target Table, Distribution Style, Sort key identify the staging table on Redshift into which the data is loaded. The Target Table is dropped and rebuilt every time the component is run.

This table should be treated as Staging table and data should be moved into a Persistent table as soon as possible.

Orchestration Job

The following image shows a orchestration job that pulls data from a Couchbase Server into a target table in Redshift using both basic and advanced mode and then copied the data to persistent table using the transformation jobs.

 

couchbase query matillion etl amazon redshift transformation

 

Conclusion

Hope that gave you a good overview of the Couchbase Query Component. Give it a go and let us know (support@matillion.com) if you need any help with this component.

Info

Want to try the Couchbase Query component in Matillion ETL for Amazon Redshift? Arrange a free 1-hour training session now, or start a free 14-day trial.