How to load data from Couchbase into Google BigQuery using Matillion ETL
Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behavior analytics, financial analysis, and even reducing the cost of synthesizing DNA.
The Couchbase Query Component presents an easy-to-use graphical interface, enabling you to connect to Couchbase and pull tables from there into Google BigQuery. Many of our customers are using this service to bring Couchbase data into BigQuery to combine with other data. Couchbase Server, originally known as Membase, is an open-source, distributed multi-model NoSQL, document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data.
The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.
The first step in configuring the Couchbase Query component is to provide your server details to connect into Couchbase. Click on the 3 dots next to the server property, and then enter the Server IP or name into the pop-up box.
User and Password
Next, you will need to supply your Couchbase database User and Password. Once Couch base has authenticated your details, you can access the data. Following this step, you can complete the Data Source and Data Selection properties in Matillion.
Once you have access to the data you will be able to choose the data you want to load into BigQuery. The Data Source drop-down is a list of the tables available in your Couchbase database. You can, therefore, use this list to easily identify and select the data you want to load.
After choosing the Data Source, next, choose the required fields from the Data Source in the Data Selection. Within Matillion, the Data Selection property shows the list of the columns available in the Data Source you previously selected. The data you select here will populate a new table you name and create in BigQuery, which will discuss in a subsequent step.
Data Source Filter
If required, you can use a filter to limit or refine the returned data. This filter will run as the WHERE clause in Couchbase.
Running the Couchbase Query component in Matillion ETL for Google BigQuery
Before you can run the component, you need the name the Target Table for your selected data. This will be the name of a new table you are creating to write the data into in BigQuery. Also, a Cloud Staging Area must be specified. This is the Cloud Storage bucket that will be used temporarily to store the results of the query before it is loaded into BigQuery.
This component also has a Limit property which can be used to force an upper limit on the number of records returned.
You can run the Orchestration job, either manually or by using the Scheduler, to query your data and bring it into BigQuery.
The Couchbase Query component offers an “Advanced” mode in addition to the default “Basic” mode. In Advanced mode, you can write a SQL-like query over all the available tables. This is automatically translated into the correct API calls to retrieve the data requested.
Transforming the Data
Once you have brought in the required from Couchbase into BigQuery, you can use it in a Transformation job:
In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of BigQuery’s power and scalability.