- New Data Loading orchestration components:
- All Data Loading orchestration components can now display data directly sampled from their source, allowing users to easily check their current configuration.
- All Data Loading orchestration components can now display the SQL generated by the component.
- CDC based on DMS gives users the power to automatically keep their Redshift tables in-sync with their source database tables using Matillion ETL.
- Support for creating External Tables over semi-structured external data hosted on S3.
- A new Nested Data Load component can flatten incoming nested data according to a user-defined structure. Unlike nested data, the flattened data can be fully-functional on Redshift.
- A new JDBC Table Metadata to Grid Component connects to many types of JDBC database and can export the metadata from a source table into a Matillion ETL Grid Variable.
- A new Migration Tool can help move any number of Matillion ETL assets directly from one Matillion ETL instance to another.
Note: The preferred (and safest) way to upgrade is now to launch a new copy of Matillion ETL running the latest version, use the Migration Tool to move and validate the new version, before deleting the existing instance. In-place upgrade may be removed in future versions.
- A selection of new components to connect to various services:
- Shopify Query connects to the user’s Shopify account.
- Survey Monkey Query loads data from a SurveyMonkey database.
- Zoho CRM Query retrieves Zoho CRM data.
- Dynamics 365 Sales Query connects to the Sales service in Dynamics 365.
- Dynamics 365 Business Central Query connects to the Business Central services in Dynamics 365.
- Many UX improvements including automatically connecting components on the canvas, improved variables workflow and new keyboard shortcuts.
- ORC and PARQUET file formats now supported in S3 Load.
- Selected Environments are now user-specific. Users can now specify their environments independently of one another.
- Users can now freely copy, cut, and paste jobs within a project.
- Autocompletion prompts now appear in many places when using Matillion ETL variables in code.
- Users can now search their environment tree to quickly find tables and views.
- New “Data Transfer” Component that boasts all the functionality of the existing S3 Get, S3 Put and Cloud Storage Put components, plus additional source and target destinations (Azure Blob Storage)
- In addition to AWS and GCP credentials, environments can now reference Azure credentials to interact with Azure services such as Blob Storage
- A new “Apache Hive Query” component connects to your Apache Hive data warehouse
- A new “LinkedIn Query” component connects to your company’s LinkedIn apps
- A new “Bing Search Query” component connects to the Bing Search API
- A new “Bing Ads Query” component connects to the Bing Ads service
- A new “Dynamics 365 Sales Query” component connects to the Dynamics 365
- Allow uploading the native Microsoft SQL Server JDBC Driver (the bundled jTDS driver is often the fastest in scenarios where it works)
- New ‘Extract To New Job’ function available by right-clicking a selection of multiple components on the canvas. Allows users to instantly create new jobs from a group of components, tidying up workflows and helping to create reusable jobs
- A new Table Properties panel ( accessible via a right-click on a table in the Environment tree) shows table metadata. For key distributed tables, a chart showing data skew is shown. For external (Spectrum) tables, file locations and partition information are shown
- The new “Salesforce Incremental Load” Wizard allows users to quickly and easily set up incremental loads from Salesforce
- The new “JDBC Incremental Load” Wizard allows users to quickly and easily create incremental loads from a variety of popular database types
- Shared Jobs:
- You can now turn your reusable orchestration jobs into their own components with their own parameters, help and Icon
- Shared jobs can be packaged and distributed across multiple ETL instances with Import and Export
- Historic Task Viewer:
- Previously completed tasks can be viewed on the canvas along with any parameter errors
- You can understand the canvas state of a job and also see the jobs contained in a Shared Job
- An “Unconditional” connector:
- Its now simpler to build orchestrations where the next orchestration step is run regardless of the success or failure of the prior step. This avoids use of extra “and” and “or” components to achieve the same thing
- “Auto Debug” for all Data Loading components:
- Data Loaders come with the Auto Debug property. When switched on, allows users to choose between 5 levels of Debug Logging verbosity
- Makes it easier to retrieve logging information without console access to the Matillion ETL Instance. Include these logs in your support requests for much faster turnaround!
- Warning: Can potentially consume large amounts of disk space. Do not leave this switched on unless directly in need of it!
- It is now possible to import, export and modify permissions via the API
- Users can select components within an orchestration job and ‘Extract To New Job‘ via the context menu, creating a new job from the selected components. A link to the new job (via a Run Orchestration component) replaces the selected components in the old job
- Window Calculation component now supports “Standard Deviation” and “Standard Deviation Population”
- Data Lineage:
- Allows you to understand the effect that your complex transformation jobs will have on your data. Track a column backwards to its source to determine where and how calculations are applied
- This is an Enterprise Feature and thus is available to customers using large and xlarge instance types
- OpenID Connect support for third party login providers:
- You can now configure Matillion ETL to authenticate with any Open ID Connect provider
- Default support for Google, Microsoft and Okta plus a “Generic” option
- Open Exchange Rates Query component connects to the Open Exchange Rates API
- Grid Iterator allow iterating the values of a Grid Variable, similarly to iterating through a table of values
- SQL Editor (in all Query components) now shows available Tables/Columns and Variables to help you author and test SQL queries from source systems
- A new “Notices” V1 API endpoint allows you to query the current system notifications and post new messages which notify all users
- New “User Configuration” and “Permission” V1 API endpoints allow user management via the Matillion API
- Matillion no longer requires “listAllBuckets” permission (although this is still recommended)
- Job Variables (scalar and grid) now have a “Visibility” that determines how they are used elsewhere
- All variables now have a description
- 100+ bug fixes across all areas of Matillion ETL
- When browsing recent STL Load Errors, more details are provided on the exact cause of parsing problems
- Enterprise Only: This version of Matillion ETL for Amazon Redshift introduces a new Permissions system that allows users to:
- Setup users with fine grained permission sets that can limit the 100+ core functions of the tool
- Provides default permission groups:
- Reader – Read only user who can’t modify a project
- Reader with Comments – Reader with ability to add notes to jobs
- Runner – A user how and execute but not modify jobs
- Scheduler – A user who can execute, schedule and change related config
- Writer – A user who can create ETL jobs but not delete projects
- Additional permission groups can be added at any time and are organised hierarchically making them easy to set up.
- A new suite of Grid Variable components are now included to make populating and manipulating them simpler – often without requiring any scripting:
- A new “SendGrid Query” component connects to the sendgrid email delivery platform
- A new “ElasticSearch Query” component to connect to the elasticsearch search engine
- A new “Magento Query” component to connect to the Magento content eCommerce system
- A new “Zuora Query” component to connect to the Zuora subscription software platform
- A new “GMail Query” component to connect to Google’s email service
- A new “Run Now” action has been added when defining a schedule
- Double-clicking a component on the canvas now opens the components “default” editor, if it has one. For example, double-clicking a Bash Script component will begin editing the script.
- Internal User: When using the “internal” security option tomcat user passwords are now hashed when stored on disk.
- Alter WLM Slots Component
- This allows queries run by Matillion to make use of multiple WLM slots to give very complex queries additional memory without having to spill intermediate results to disk. You can determine which queries spill to disk with SVL_QUERY_SUMMARY and similar system views.
- This can be increased and decreased during a flow, so only those parts of the ETL that require it consume additional resources
- External (Domain-based) Login: You can now encrypt your Realm Password with the AWS Key Management Service (KMS)
- Zendesk Query orchestration component for loading data from the Zendesk customer relationship system
- Mixpanel Query orchestration component for loading data from Mixpanel product analytics system
- Xero Query orchestration component for loading data from the Xero accounting system
- Dynamics 365 Query orchestration component for loading data from Microsoft Dynamics CRM/ERP
- API Profile RSD Generator
- Accelerate the development of API Profiles using a new tool that automatically generates a basic XML “RSD descriptor” for any API endpoint, based on a sample of data returned
- REST API Version 1 – Matillion ETL now has full API coverage:-
- You can now read/write more assets (JDBC Drivers, credentials, SQS configuration) as well as allowing finer-control of which resources to include
- A map of the v1 API is available here
- The “v0” api is still available and unchanged
- Grid Variables System
- In addition to “scalar” (single-valued) variables, you can now define grid variables to hold lists and grids of values; use them wherever a compatible list or grid of values is required
- Grid variables can be manipulated/modified in Python
- You can pass values for grid variables when starting a job via SQS and/or the V1 API
- You can now disable parts of an Orchestration job
- Improved Matching in column mappings – Many transformation component “Column Mapping” parameters can now be automatically mapped, even when the input and output column names are similar but not identical
- You can now delete tables/views directly from the Environment tree
- External Tables based on Amazon Redshift Spectrum now support skipping header rows
Important (possible breaking change): API Profiles (“RSD’s”) that handle paging may need to be tweaked to disable “auto” paging. Please see here for more details.
Important (possible breaking change): API profile limits are now applied. Where the default of 100 is set it will now be applied. This could affect API Query Components which previously ignored that limit.
- Redshift now supports Real/Double data types
- The External Table Output component for Redshift Spectrum now has partitioning support.
- New “Copy Table to External Schema” GUI tool generates new “Table Input/External Table Output” components with full partitioning support.
- Perfect for customers who want to try out Redshift Spectrum with their existing data.
- Redshift now supports “late binding” views on input components and view creation.
- The Google BigQuery Query component now supports standard SQL.
- New Server Migration Tool makes it easy to migrate all configuration including Oauth, API Profiles and Drivers in addition to projects to a new Matillion Cluster.
- Redesigned “Scheduler” user interface to simplify the management of scheduled orchestration jobs.
- New “Task Info” panel and “Task” panel make it much easier to understand complex tasks both at run time and after job execution.
- Matillion variables can be defined and scoped at job level making jobs much more reusable. Variables can now be passed to and returned from jobs.
- New Quickbooks Online Query component to connect to the popular online accounting system.
- New Square Query component to connect to the payment system.
- New Google Custom Search component allows google search data to be ingested.
- All data-staging components can append rows to an existing table as well as creating new tables.
- Matillion ETL for Amazon Redshift introduces the ability to configure Matillion ETL in a highly available topology with fully active-active cluster. This feature is only available on large and xlarge instance types.
- Jobs run from SQS, the API or the built-in Scheduler will now fail-over in the event of an instance failure.
- Scheduled runs missed because a server is offline will be run when it becomes available again.
- Once two or more members are in the cluster, a Cluster Info tab shows membership status and activity.
- OAuth tokens, Database Drivers and RSD Profiles are replicated via the persistence database (postgres).
- Logging from each node is sent to Cloudwatch.
- Cloudformation Templates help you get started with a clustered Matillion.
- New Jira Query component loads data from Atlassian’s popular Software Development Platform.
- New PayPal Query component can load payment and other data from Paypal Business accounts.
- New ServiceNow Query component loads data from Servicenow’s IT Service Management (ITSM) platform.
- New Stripe Query component loads data from Stripe’s payment platform
- New Email Query component can query an IMAP based email system.
- New YouTube Analytics component can query data from the YouTube Analytics API.
- Excel Query can now load files from Google Cloud Storage, as well as Amazon S3
- You only see S3 and/or GCS when you have credentials in the environment, otherwise they are hidden.
- New option to drop a schema from the Environment Tree.
- Specify a region in S3 Unload (to allow writing to buckets outside of the Redshift cluster’s region)
- S3 / Google Cloud Storage file browser enhancements.
- Set advanced connection options during OAuth flow (e.g. to connect to a Salesforce Sandbox)
- Warning: Manage Backups and View Audit haven’t been removed, they have been moved to the Admin menu.
- New External Table Output component – similar to Table Output but creates a Amazon Redshift Spectrum table over S3 data.
- New “Add Partition” component (Amazon Redshift Spectrum only).
- New “Delete Partition” component (Amazon Redshift Spectrum only).
- Other Amazon Redshift Spectrum components have new Table Partitioning parameters.
- Amazon Redshift Spectrum support. You can now run SQL Queries in redshift directly against data sets in your S3 data lake in Text, Parquet, SequenceFile and other formats. Matillion ETL 1.28 introduces first-class support for all key Redshift Spectrum features and will allow users to combine Amazon Redshift Spectrum data with regular Redshift data in transformations. These include:
- Components for creating an External Tables over S3 Data.
- Rewrite External Table writes redshift data into S3 and defines an external table to reference it.
- All data-staging orchestration components (all components ending in “query”) can write data to S3 and generate a compatible External Table to reference it. This will allow users to keep both small data sets and very large data sets in S3.
- Amazon Redshift Spectrum schemas and tables are displayed in the Environment Tree.
- Matillion no longer relies on creating views in Redshift to represent components.
- Post upgrade we recommend using the “Delete Views” function on the Environment to remove existing views generated by Matillion. These will not be recreated and any other v_xxxxxxxxxx_xxxxxxxxxx views can be safely, manually removed.
- A new Admin Menu allows administrators to:
- Get the server log.
- Update the Matillion server version.
- Configure users (using either an internal user database or external directory server).
- Configure SSL.
- Note: This will replace the existing /admin application. This is currently retained on upgrade but will be removed in a future update. Please use the new Admin Menu where possible.
- All transformation components support multiple outputs.
- Separate Replicate component no longer mandatory.
- Enterprise Features (these features are only enabled for users running m4.large or m4.xlarge instance types).
- Automatic Job Documentation. Matillion ETL can automatically generate documentation for your ETL process. This tool will recursively search your jobs and include all job detail including linked notes and descriptions.
- Auditing of User Actions with searchable Audit Log provides fine grained audit of every change to an ETL process.
- Ability to use Matillion ETL with an external postgresql repository on RDS. Allows you to externalise all your Matillion Job and configuration data to RDS and take advantage or RDS features such as backups and point-in-time recovery. Please contact Matillion Support if you wish to take advantage of this.
- Database Query now supports IBM Netezza data warehouses via JDBC.
- S3 Server Side Encryption. Data written to S3 from any Query component, the S3 Put component or the S3 Unload component can now apply Server Side Encryption (SSE-S3 or SSE-KMS)
- The Python Script component has been upgraded and now supports use of Jython (the default), Python 2 and Python 3. This is useful for customers who wish to use pip modules that are not pure python.
- New Search system. Find jobs, notes and component properties anywhere in a project via a new Search tab.
- An upgraded UI toolkit delivers a smoother, faster user experience.
- Upgrades to the Task Panel add the ability for the user to:-
- Multi-select tasks to cancel in the task panel.
- Collapse all expanded items.
- Remove all completed tasks.
- Export Jobs now allows you to multi-select a choice of jobs in the job tree and export them.
- The S3 Put component now allows the user to grab data from a self-signed HTTPS endpoint.
- Google Cloud users – Also check out new Matillion ETL for BigQuery.
- A new Youtube Query Component.
- A new EMR Load Component to make it easier to natively load EMR data sets.
- Environment Explorer Tree shows UDFs, Primary Keys, Sort Keys, Distribution Keys.
- Validation of Orchestration tasks now run in the background and appear in the Task panel (in the same way as Transformation tasks) this is more predictable, particularly for components that take longer to validate against 3rd party API’s.
- A new Connection Manager allows you to see and control connected sessions. This will also prevent users from being locked out when they hit their connection limit.
- All data-loading components now support a “Load Options” parameter to control:-
- keeping the objects in S3 after the load completes for archive purposes.
- Turning off automatic compression analysis.
- Turning off automatic statistics gathering.
- Users (using internal security) can be added/removed without requiring a restart.
- The If component now logs the decision taken to the task panel. This will help users diagnose decision logic problems.
- Matillion now runs on Java 8.
New Data Connectors
- New Text Output orchestration component simplifies export to CSV and other Text based formats.
- Similar to S3 Unload, with support for headers
- File Iterator now supports S3, you can loop over a list of files in an S3 bucket.
- KMS Encryption option in password manager allows you to use AWS managed encryption keys to encrypt passwords in Matillion ETL.
- Run Transformation / Run Orchestration components now support variable overrides to make it easier to run jobs in a reusable manner.
- Added support for the boolean data type.
- The scheduling test will check your maintenance window and warn of possible overlaps.
- Some orchestration components such as Create Table have an SQL Tab so it is easy to understand the generated SQL.
- Additional methods available on Matillion date variables will simplify using dates in variables.
- New cleaned up and simplified sample tab.
- Hundreds of other tweaks, minor improvements and bug fixes.
New Data Connectors
- Delete Tables – Remove table such as temp tables as part of an Orchestration.
- S3 Get Object – Get S3 Objects and push them SFTP, HDFS and Windows File Shares.
- File Iterator – Iterate over a list of pattern matched objects in an FTP, SFTP, HDFS or Windows Fileshare.
S3 Load Generator
- This tool helps generate compatible “Create Table” and “S3 Load” components by sampling delimited data files on S3 and guessing the layout.
- Private projects can be created
- Projects have an owner who controls which other users can collaborate
- You may enable automated daily backups of the Matillion ETL for Amazon Redshift instance root volume
New Chat and Presence Features
- You can see who else is collaborating with you, and chat to them. Chats are persisted to provide context on your project
- Create Table and Fixed Flow components support additional data types (Integer, Date). More to follow.
- S3 Put Object now supports S3 as a source (in case you have ZIP files on S3 that need unpacking before loading to Redshift).
- The SQL component can now be used at the beginning of a flow.
- We now include an API profile for Matillion’s API to copy the run history to Redshift. The API Query component can be used to query this data and import to Redshift.
Plus hundreds of minor improvement and bug fixes.
- The Sample tab now allows filtering to assist debugging complex transformations.
- Real-time validation of expressions in the Expression Editor
- Your syntax is checked by Redshift as you type.
- Jobs and folders in the explorer can be moved and copied in bulk.
- Improved editor windows. You can see available variables and test your code without leaving the editor when writing Python and SQL Scripts.
- Notes can now include bold, underlined and italic text, as well as hyperlinks.
- The Task History is now searchable, and opens in a separate tab.
- In the environment navigation browse the available tables, views and columns within each environment; drag and drop them into a Transformation.
- The S3 Load Component can specify an IAM Role ARN that is attached to your Redshift cluster.
- On the Table Output Component “Analyze Compression” now supported an “If not compressed already” setting.
- Python modules can now be installed with ‘pip’, and the latest boto3 API is now included by default for interaction with AWS services.
- The RDS Bulk Output Component now supports output to Postgresql databases.
- The S3 Put Component can now read directly from HDFS.
- Please backup the instance before upgrading! https://redshiftsupport.matillion.com/customer/en/portal/articles/1991953-upgrading-matillion-etl-for-redshift?b_id=8915
- If you have version 1.21.5 and have the admin app set up you can upgrade using the Admin App.
- If for any reason the upgrade fails, restore from backup and please contact support.
- Non-blocking task queue allows users to collaborate more seamlessly without being blocked by each others requests.
- Multiple runs of the same job will queue.
- All other runs may happen concurrently, regardless of the environment
- Load data from Hubspot with the HubSpot Query Component.
- Load Odata Sources with the OData Query Component.
- Load Microsoft Excel Spreadsheets with the Excel Query component.
- Load Google AdWords data with the Google AdWords Query component.
- SFTP Put Object component will allow you to write transformed data from Redshift back to an SFTP server.
- Retry Component allows automatic retrying and backoff which is most useful for 3rd party API’s that are not 100% reliable.
- S3 Put Object now supports copying a file from a Windows File Share.
- You can now run an orchestration job from part way through.
- Profile editor for bulding data profiles to describe how API’s map to tables and columns that can then be queried from the API Query Component
- Import/Export can now include details of Variables and Environments
- Notices/warnings/errors are now displayed on a new “Notices” tab.
- Preview API to import/export entire projects, run jobs, monitor running jobs.
- Can be used for integration to 3rd party source control management systems.
- Ask support for more details on how to get started with this
Plus hundreds of performance improvements and minor features.
- Please backup the instance before upgrading! https://redshiftsupport.matillion.com/customer/en/portal/articles/1991953-upgrading-matillion-etl-for-redshift?b_id=8915
- If you are upgrading Matillion to 1.21.5 from a previous release, once you apply the updates and restart the application server (tomcat), the internal job repository will be re-written in a new format. Please be patient and do not restart the server (or tomcat) during this period.
- If for any reason the upgrade fails, restore from backup and please contact support.
- Google Spreadsheets Query
- Marketo Query
- RDS Bulk Output (for Aurora, MySQL and MariaDB)
- Bash Script
- CloudWatch Publish
Project Selection and Organisation
- Open a job directly from the project chooser
- Recently opened jobs are tracked
- Project Selection is searchable
- Create your own folder structure in the project tree
- Centrally Managed Passwords
- Manage Users, Software Upgrades and more through a new Admin screen
- Copy/Paste settings between spreadsheets/text files and the Grid Editor
- SNS/SQS/RDS Components will offer Topics, Queues and Endpoints to choose from (if the given credentials allow it)
Plus dozens of other minor improvements and fixes!
- Concurrent execution of orchestration tasks (orchestration multi-threading).
- New load components for Salesforce, MongoDb, DynamoDb, Google BigQuery, Netsuite, Microsoft Dynamics and LDAP.
- S3 Put’ Zip Unpacking (unpack contents of a ZIP file to an S3 bucket from remote FTP/HTTP server).
- Support for Views as data input in Transformation.
- Support for Views as data input in Transformation. Support for Views as data output in Transformation.
- Teradata support added to Database Query component.
- RDS certificate support added to keystore functionality.
Over 90 other minor improvements and bug fixes.
- New load components for Facebook, Twitter and Google Analytics.
- New API Query component, loads data from any REST, JSON or SOAP API.
- Commitment control functionality.
- Detect changes component – for SCDs and real time updates
- Improved task cancellation
Over 100 other minor improvements
- New ‘If’ orchestration component
- New component variable features set (inc. row count, duration, error handling, performance information)
- Added S3 Manifest Writer component
- Added Transpose Rows component
- Added support for SSL
- Support for Redshift User Defined Functions
- Support for interleaved sorts
- Encryption on S3 load
- AVRO support for S3 load
- Improved script editors to add syntax highlighting and auto-complete
- DB2 in the database query component.
Dozens more minor improvements
- Multi-schema support
- Iteration added to orchestration flows (table, list and fixed)
- New S3 Put Component – load data from FTP/HTTP
- New Python component
- Manifest support in S3 loader added.
Lots of other minor improvements
- SQS Integration added
- Added ability to nest orchestration jobs
- New components added: SQS Message, SNS Message
- Load data from RDBMS (Database Query component)
- Manually or automatically set column encoding on Redshift tables
- Cancel task feature added
- New ‘Schema Copy’ component
Dozens of other minor enhancements
- New scheduler
- RDS query component added,
- UI enhancements (e.g. snap to grid)
- Analyze/Vacuum/Truncate components added
- New transformation components (Regex)
Lots of other minor improvements and bug fixes.
- New S3 Load/Unload components
- Added ability to manually define AWS credentials after AMI Launch
- Internal caching makes running the same jobs repeatedly much faster
- New components can (i) update table rows, (ii) delete table rows, (iii) split a field on a delimiter
72 other bug fixes and minor improvements