Using the MongoDB Query Component in Matillion ETL for Amazon Redshift
Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behaviour analytics, financial analysis, and even reducing the cost of synthesising DNA.
The MongoDB Query component presents an easy-to-use graphical interface, enabling you to pull data from a MongoDB database directly into Amazon Redshift. Many of our customers are using this service to pull data from a local MongoDB into Amazon Redshift. Form there they can then combine this data with additional data for further analysis.
The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.
Watch this short tutorial video to see how to use the MongoDB Query component in Matillion ETL for Amazon Redshift.
The first step in configuring the MongoDB Query component is to provide the name or the IP address of the MongoDB server. This server must be accessible from the Matillion EC2 instance. You can leave the Port, Username and Password properties blank to use the defaults port (27017). You will continue without username or password settings. After the server has been specified, the Database must be specified:
Matillion gives you the option to specify whether nested data objects should be flattened into one set of fields to be loaded into a table structure or to allow the nested objects to be returned as JSON strings.
Next, you will need choose the data you want to load into Amazon Redshift from the Data Source drop down. This is a list of collections in the MongoDB database specified above:
After choosing the Data Source, the subsequent step is to choose the required fields from the data source in the Data Selection property. Matillion scans through the data in the selected table to detect the fields. You can then select all, or a subset of the fields, to include. When you run the job in Matillion a new table will be created in Amazon Redshift with the selected fields.
The MongoDB Query component comes with some additional support parameters for you to configure. None of the Connection Options are mandatory with the MongoDB driver as sensible defaults are usually given. However, further details on the options are available here. When using Authentication with this component, it is recommended that you add a Connection Option called AuthDatabase and give the name of the MongoDB Authentication Database.
Running the MongoDB Query Component in Matillion ETL for Amazon Redshift
Before you can run the component, you need to name the Target Table. The Target Table you are naming is the new table being created in Amazon Redshift. Once created the data you are loading can be to written into the table. Also a S3 Staging Area must be specified. This is a S3 bucket which is used to temporarily store the results of the query before it is loaded into Amazon Redshift.
This component also has a Limit property. You can use the Limit property to force an upper limit on the number of records returned. We recommend using either a limit or a filter to reduce the number of rows returned because this will improve the speed of your job.
You can run the Orchestration job, either manually or using the Scheduler, to query your data and bring it into Amazon Redshift.
The MongoDB Query component offers an “Advanced” mode instead of the default “Basic” mode.
In Advanced mode, you can write a SQL-like query over all the available fields in the data model. This is automatically translated into the correct API calls to retrieve the data requested.
Transforming the Data
Once the required data has been brought in from the Mongo database into Amazon Redshift, it can then be used in a Transformation job, perhaps to enhance existing data:
In this way, you can build out the rest of your downstream transformations and analysis, taking advantage of Amazon Redshift’s power and scalability.
Want to try the MongoDB Query component in Matillion ETL for Amazon Redshift? Arrange a free demo, or start a free 14-day trial.