Using the Jira Query component in Matillion ETL for BigQuery
Data ExtractionYou can access this component from a Matillion Orchestration job. You can find Jira Query among the Load/Unload components when editing an Orchestration job. Data is extracted from your Jira online account, so before getting started you’ll need to know your URL. This is the same as the address you use to access your Jira account from a web browser: Use the initial part in the component’s URL property, as shown below. When configuring the component, it’s normally best to work from the top property downwards, since some of the prompts have a cascading effect on others lower down.
AuthenticationThe Jira Query component requires authentication, in the form of a login username and password. You’ll find your Username in the Profile part of the online portal. Your password will be as created by the administrator unless you have changed it since then! Use these values in the Username and Password properties. For the password, you have the option of using Matillion’s built-in Password Manager.
Data SourceAtlassian makes a number of tables and views available to query through their API. These are listed in the Data Source dropdown property within Matillion ETL. Choose the one you want to query. You can see a full list of available data sources in the Data Model.
Data SelectionOnce you have made your choice of Data Source, you can then open the Data Selection property, and choose which items you want to bring into BigQuery. The items you choose in this dialog box will become columns in your BigQuery table at runtime.
Data Source FilterThis is an optional field: you don’t have to specify a filter, and if you leave it blank the component will simply copy all the source records into BigQuery. Alternatively, you may choose to filter on a field, for example like this: Another commonly-used filter is a date range, for example, a last-modified date, used with Incremental or High Water Mark data loading. If you specify more than one filter, you may choose to combine them with an AND or with an OR condition. Take advantage of the Limit clause during testing. Its default value of 100 is a useful row-stop. Once you are happy with the other settings, simply remove the Limit value and the component will load all records matching the filter criteria.
Advanced Mode and Connection OptionsThe Jira Query component will normally meet all your requirements in its basic mode of operation, as described in the previous few sections. Behind the scenes, your dialog options are used to create a SQL-like query, which is sent to the Atlassian API for evaluation. Under some circumstances, for example under the guidance of Matillion support, you may choose to use
- Advanced Mode - where you specify the SQL query directly
- Connection Options - which allow fine-tuning of low-level settings
- Load Options - to alter the default behavior of the data loading stage
Running the Jira Query component in Matillion ETL for BigQueryThere are two final, mandatory properties that you still need to set:
- Target Table
- Cloud Storage Staging Area
Target TableThis is the name of the BigQuery table that Matillion will create on your behalf. If you leave the Project and Dataset at their default value of [Environment Default] then the table will be created in the Project and Dataset that you chose for your Environment. Alternatively, you can choose a hardcoded value from the dropdown list, or even a Matillion variable. The Target Table will be created every time the component runs. If it happened to exist already, then by default it will be dropped and re-created. This is because you can change the Data Source and Data Selection at any time, and Matillion must create a table with the correct columns. Note that the Target Table is usually not the final destination of your data. It’s just used to temporarily store newly-loaded data, ready for integration into the final data model. For this reason, we recommend you use a naming standard, such as an STG_ prefix, to differentiate Target Tables in Load components. More on this subject in Transforming the Data.
Cloud Storage Staging AreaThis option is used during the mechanics of data loading. Files will only exist in this location for a short period of time. It’s best to:
- Choose a Cloud Storage bucket which is normally empty
- Parameterize the value, so you can easily use it in many places
Manager of Developer Relations
10 Best Practices for Maintaining Data Pipelines
Mastering Data Pipeline Maintenance: A Comprehensive GuideBeyond ...News
Matillion Adds AI Power to Pipelines with Amazon Bedrock
Data Productivity Cloud adds Amazon Bedrock to no-code generative ...Blog
Data Mesh vs. Data Fabric: Which Approach Is Right for Your Organization? Part 3
In our recent exploration, we've thoroughly analyzed two key ...