Couchbase Server, originally known as Membase, is an open-source, distributed multi-model NoSQL document-oriented database software package that is optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data.
In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value or JSON document access with low latency and high sustained throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.
In this article we will be looking at how to extract data from the Couchbase server, from on-premise or Cloud to your Amazon Redshift data warehouse.
Watch our tutorial video for a demonstration on how to set up and use the Couchbase Query Component in Matillion ETL for Amazon Redshift.
Couchbase Query Component
Matillion’s Couchbase Query Component can be used to run an SQL Query on an accessible database and copy the result to a staging table, via S3 Staging buckets. Any SQL that’s valid for the source database may be used.
- The Couchbase Server is accessible from the Matillion instance
Couchbase Server are either hosted in-house secured by a DMZ; or even within your VPC as a server on AWS. In all these cases, ensure the Matillion instance can reach the service by name or IP address.
This Couchbase Query component retrieves data on a Couchbase server and loads it into a Redshift table. This stages the data, so the table is reloaded each time. You may then use transformations to enrich and manage the data in permanent tables.
Warning: This component is destructive as it truncates or recreates its target table on each run. Do not modify the target table structure manually.
Below is a typical workflow for importing data from a database into Redshift
- Couchbase connects and executes SQL against the Couchbase Server
- Data is read from the Couchbase server and streamed to S3 Staging buckets (Data is never written to disk)
- Matillion copies the data from S3 Stage buckets to to Redshift using “Copy” command.
Using the Couchbase Query Component
The Couchbase Component can be found under the “Load/Unload” folder in the components panel. The following image shows the properties for a configured component.
The options Server, User and Password, Connection Options cover the details required to establish the connection to your Couchbase Server.
Connection Options are additional properties you can define against your connection. For example UseSSL and SSLServerCert options may be set to enforce communication over a secure channel.
In the basic mode you can provide Data source and Data Selection (column list) that will pull data from Couchbase.
In the Advanced mode you can use SQL Query that will query the database. Any variable definitions are replaced with appropriate values and the query issued against your Couchbase Server.
S3 Staging Area is a bucket where the results will be held prior to loading into Redshift.
Schema, Target Table, Distribution Style, Sort key identify the staging table on Redshift into which the data is loaded. The Target Table is dropped and rebuilt every time the component is run.
This table should be treated as Staging table and data should be moved into a Persistent table as soon as possible.
The following image shows a orchestration job that pulls data from a Couchbase Server into a target table in Redshift using both basic and advanced mode and then copied the data to persistent table using the transformation jobs.
Hope that gave you a good overview of the Couchbase Query Component. Give it a go and let us know (firstname.lastname@example.org) if you need any help with this component.