Blog| Matillion ETL for BigQuery

Using the Jira Query component in Matillion ETL for BigQuery

Matillion uses the Extract-Load-Transform (ELT) approach to delivering quick results for a wide range of data processing purposes: everything from customer behavior analytics, financial analysis, and even reducing the cost of synthesizing DNA.

The Jira Query component presents an easy-to-use graphical interface, enabling you to pull data from Atlassian Jira directly into Google BigQuery. Many of our customers are using this to help integrate Jira project management data with information held in other systems.

The connector is completely self-contained: no additional software installation is required. It’s within the scope of an ordinary Matillion license, so there is no additional cost for using the features.

Data Extraction

You can access this component from a Matillion Orchestration job. You can find Jira Query among the Load/Unload components when editing an Orchestration job.

 

Jira in Matillion ETL for BigQuery - Data Extraction

 

Data is extracted from your Jira online account, so before getting started you’ll need to know your URL. This is the same as the address you use to access your Jira account from a web browser:

 

Jira in Matillion ETL for BigQuery - Data Extraction URL

 

Use the initial part in the component’s URL property, as shown below.

 

Jira in Matillion ETL for BigQuery - Data Extraction URL in Matillion

 

When configuring the component, it’s normally best to work from the top property downwards, since some of the prompts have a cascading effect on others lower down.

Authentication

The Jira Query component requires authentication, in the form of a login username and password.

You’ll find your Username in the Profile part of the online portal. Your password will be as created by the administrator unless you have changed it since then! Use these values in the Username and Password properties.

 

Jira in Matillion ETL for BigQuery - Authentication

 

For the password, you have the option of using Matillion’s built-in Password Manager.

 

Jira in Matillion ETL for BigQuery - PW manager

 

Data Source

Atlassian makes a number of tables and views available to query through their API. These are listed in the Data Source dropdown property within Matillion ETL. Choose the one you want to query.

 

Jira in Matillion ETL for BigQuery - Data Source

 

You can see a full list of available data sources in the Data Model.

Data Selection

Once you have made your choice of Data Source, you can then open the Data Selection property, and choose which items you want to bring into BigQuery.

 

Jira in Matillion ETL for BigQuery - Data Selection

 

The items you choose in this dialog box will become columns in your BigQuery table at runtime.

Data Source Filter

This is an optional field: you don’t have to specify a filter, and if you leave it blank the component will simply copy all the source records into BigQuery.

Alternatively, you may choose to filter on a field, for example like this:

 

Jira in Matillion ETL for BigQuery - Data Source Filter

 

Another commonly-used filter is a date range, for example, a last-modified date, used with Incremental or High Water Mark data loading.

If you specify more than one filter, you may choose to combine them with an AND or with an OR condition.

 

Jira in Matillion ETL for BigQuery - Data Source Filter and or

 

Take advantage of the Limit clause during testing. Its default value of 100 is a useful row-stop. Once you are happy with the other settings, simply remove the Limit value and the component will load all records matching the filter criteria.

Advanced Mode and Connection Options

The Jira Query component will normally meet all your requirements in its basic mode of operation, as described in the previous few sections.

Behind the scenes, your dialog options are used to create a SQL-like query, which is sent to the Atlassian API for evaluation. Under some circumstances, for example under the guidance of Matillion support, you may choose to use

  • Advanced Mode – where you specify the SQL query directly
  • Connection Options – which allow fine-tuning of low-level settings
  • Load Options – to alter the default behavior of the data loading stage

However, it’s usually best to leave these options unchanged and perform any necessary data manipulations inside Google BigQuery.

Running the Jira Query component in Matillion ETL for BigQuery

There are two final, mandatory properties that you still need to set:

  • Target Table
  • Cloud Storage Staging Area

Target Table

This is the name of the BigQuery table that Matillion will create on your behalf.

 

Jira in Matillion ETL for BigQuery - Target Table

 

If you leave the Project and Dataset at their default value of [Environment Default] then the table will be created in the Project and Dataset that you chose for your Environment.

Alternatively, you can choose a hardcoded value from the dropdown list, or even a Matillion variable.

The Target Table will be created every time the component runs. If it happened to exist already, then by default it will be dropped and re-created. This is because you can change the Data Source and Data Selection at any time, and Matillion must create a table with the correct columns.

Note that the Target Table is usually not the final destination of your data. It’s just used to temporarily store newly-loaded data, ready for integration into the final data model. For this reason, we recommend you use a naming standard, such as an STG_ prefix, to differentiate Target Tables in Load components. More on this subject in Transforming the Data.

Cloud Storage Staging Area

This option is used during the mechanics of data loading. Files will only exist in this location for a short period of time. It’s best to:

  • Choose a Cloud Storage bucket which is normally empty
  • Parameterize the value, so you can easily use it in many places

In order to parameterize the value, you must create a Matillion variable, give it a default value, and then reference the variable in the Cloud Storage Staging Area.

To create the variable, go to Project / Manage Environment Variables, and create a new variable for example named cs_staging.

 

Jira in Matillion ETL for BigQuery - Cloud Storage Staging area

 

Ensure that you have supplied a default value for the Environment you’re intending to use. In the above example, there’s a default of gs://yourbucket/ for the Dev environment.

To use the variable in your Jira Query component, replace any hardcoded value with the variable name, wrapped inside ${}, for example, ${cs_staging}

 

Jira in Matillion ETL for BigQuery - Cloud Storage Staging area 2

 

Look for a green OK next to the Cloud Storage Area property. This indicates that the variable’s default value has been applied and is valid.

Once you have a row of green OK markers in the properties, and the component outline changes to green and you are ready to run it!

 

Jira in Matillion ETL for BigQuery - run component

 

Watch in the panel at bottom right to check the progress of the task.

 

Jira in Matillion ETL for BigQuery - task

 

It should show a green tick once it has finished.

Transforming the Data

Having run the Jira Query component, you should then be able to find the newly-created table in your BigQuery console.

 

Jira in Matillion ETL for BigQuery - Transformation

 

You’re ready to start work on a new Transformation Job, which reads stg_issues as a Table Input and transforms the data ready to merge into its final destination.

Useful Links

 
 

Want to try the Jira Query component in Matillion ETL for BigQuery? Request a 1-hour free demonstration, or start a 14-day free trial.