What's New & Latest Features | Matillion ETL for Amazon Redshift

What’s New

  • 20-02-2018 [1.31] API profile generator, grid variables, improved matching, new data load components and more
    • Zendesk Query orchestration component for loading data from the Zendesk customer relationship system
    • Mixpanel Query orchestration component for loading data from Mixpanel product analytics system
    • Xero Query orchestration component for loading data from the Xero accounting system
    • Dynamics 365 Query orchestration component for loading data from Microsoft Dynamics CRM/ERP
    • API Profile RSD Generator
      • Accelerate the development of API Profiles using a new tool that automatically generates a basic XML “RSD descriptor” for any API endpoint, based on a sample of data returned
    • REST API Version 1 – Matillion ETL now has full API coverage:-
      • You can now read/write more assets (JDBC Drivers, credentials, SQS configuration) as well as allowing finer-control of which resources to include
      • A map of the v1 API is available here
      • The “v0” api is still available and unchanged
    • Grid Variables System
      • In addition to “scalar” (single-valued) variables, you can now define grid variables to hold lists and grids of values; use them wherever a compatible list or grid of values is required
      • Grid variables can be manipulated/modified in Python
      • You can pass values for grid variables when starting a job via SQS and/or the V1 API
    • You can now disable parts of an Orchestration job
    • Improved Matching in column mappings – Many transformation component “Column Mapping” parameters can now be automatically mapped, even when the input and output column names are similar but not identical
    • You can now delete tables/views directly from the Environment tree
    • External Tables based on Amazon Redshift Spectrum now support skipping header rows


    Important (possible breaking change): API Profiles (“RSD’s”) that handle paging may need to be tweaked to disable “auto” paging. Please see here for more details.

    Important (possible breaking change): API profile limits are now applied. Where the default of 100 is set it will now be applied. This could affect API Query Components which previously ignored that limit.

  • 23-11-2017 [1.30] Partitioning Support for Spectrum, Improved Task Info Panel, Server Migration Tool, New Data Load Components and more
    • Redshift now supports Real/Double data types
    • The External Table Output component for Redshift Spectrum now has partitioning support.
    • New “Copy Table to External Schema” GUI tool generates new “Table Input/External Table Output” components with full partitioning support.
      • Perfect for customers who want to try out Redshift Spectrum with their existing data.
    • Redshift now supports “late binding” views on input components and view creation.
    • The Google BigQuery Query component now supports standard SQL.
    • New Server Migration Tool makes it easy to migrate all configuration including Oauth, API Profiles and Drivers in addition to projects to a new Matillion Cluster.
    • Redesigned “Scheduler” user interface to simplify the management of scheduled orchestration jobs.
    • New “Task Info” panel and “Task” panel make it much easier to understand complex tasks both at run time and after job execution.
    • Matillion variables can be defined and scoped at job level making jobs much more reusable. Variables can now be passed to and returned from jobs.
    • New Quickbooks Online Query component to connect to the popular online accounting system.
    • New Square Query component to connect to the payment system.
    • New Google Custom Search component allows google search data to be ingested.
    • All data-staging components can append rows to an existing table as well as creating new tables.
  • 04-10-2017 [1.29] High availability, new data load components, external table output and more


    • Matillion ETL for Amazon Redshift introduces the ability to configure Matillion ETL in a highly available topology with fully active-active cluster. This feature is only available on large and xlarge instance types.
      • Jobs run from SQS, the API or the built-in Scheduler will now fail-over in the event of an instance failure.
      • Scheduled runs missed because a server is offline will be run when it becomes available again.
      • Once two or more members are in the cluster, a Cluster Info tab shows membership status and activity.
      • OAuth tokens, Database Drivers and RSD Profiles are replicated via the persistence database (postgres).
      • Logging from each node is sent to Cloudwatch.
      • Cloudformation Templates help you get started with a clustered Matillion.
    • New Jira Query component loads data from Atlassian’s popular Software Development Platform.
    • New PayPal Query component can load payment and other data from Paypal Business accounts.
    • New ServiceNow Query component loads data from Servicenow’s IT Service Management (ITSM) platform.
    • New Stripe Query component loads data from Stripe’s payment platform
    • New Email Query component can query an IMAP based email system.
    • New YouTube Analytics component can query data from the YouTube Analytics API.
    • Excel Query can now load files from Google Cloud Storage, as well as Amazon S3
      • You only see S3 and/or GCS when you have credentials in the environment, otherwise they are hidden.
    • New option to drop a schema from the Environment Tree.
    • Specify a region in S3 Unload (to allow writing to buckets outside of the Redshift cluster’s region)
    • S3 / Google Cloud Storage file browser enhancements.
    • Set advanced connection options during OAuth flow (e.g. to connect to a Salesforce Sandbox)
    • Warning: Manage Backups and View Audit haven’t been removed, they have been moved to the Admin menu.
    • New External Table Output component – similar to Table Output but creates a Amazon Redshift Spectrum table over S3 data.
    • New “Add Partition” component (Amazon Redshift Spectrum only).
    • New “Delete Partition” component (Amazon Redshift Spectrum only).
    • Other Amazon Redshift Spectrum components have new Table Partitioning parameters.
  • 29-06-2017 [1.28] Amazon Redshift Spectrum Support, New Admin Menu, Enterprise Features and more


    • Amazon Redshift Spectrum support. You can now run SQL Queries in redshift directly against data sets in your S3 data lake in Text, Parquet, SequenceFile and other formats. Matillion ETL 1.28 introduces first-class support for all key Redshift Spectrum features and will allow users to combine Amazon Redshift Spectrum data with regular Redshift data in transformations. These include:
      • Components for creating an External Tables over S3 Data.
      • Rewrite External Table writes redshift data into S3 and defines an external table to reference it.
      • All data-staging orchestration components (all components ending in “query”) can write data to S3 and generate a compatible External Table to reference it. This will allow users to keep both small data sets and very large data sets in S3.
      • Amazon Redshift Spectrum schemas and tables are displayed in the Environment Tree.
    • Matillion no longer relies on creating views in Redshift to represent components.
      • Post upgrade we recommend using the “Delete Views” function on the Environment to remove existing views generated by Matillion. These will not be recreated and any other v_xxxxxxxxxx_xxxxxxxxxx views can be safely, manually removed.
    • A new Admin Menu allows administrators to:
      • Get the server log.
      • Update the Matillion server version.
      • Configure users (using either an internal user database or external directory server).
      • Configure SSL.
      • Note: This will replace the existing /admin application. This is currently retained on upgrade but will be removed in a future update. Please use the new Admin Menu where possible.
    • All transformation components support multiple outputs.
      • Separate Replicate component no longer mandatory.
    • Enterprise Features (these features are only enabled for users running m4.large or m4.xlarge instance types).
      • Automatic Job Documentation. Matillion ETL can automatically generate documentation for your ETL process. This tool will recursively search your jobs and include all job detail including linked notes and descriptions.
      • Auditing of User Actions with searchable Audit Log provides fine grained audit of every change to an ETL process.
      • Ability to use Matillion ETL with an external postgresql repository on RDS. Allows you to externalise all your Matillion Job and configuration data to RDS and take advantage or RDS features such as backups and point-in-time recovery. Please contact Matillion Support if you wish to take advantage of this.
    • Database Query now supports IBM Netezza data warehouses via JDBC.
    • S3 Server Side Encryption. Data written to S3 from any Query component, the S3 Put component or the S3 Unload component can now apply Server Side Encryption (SSE-S3 or SSE-KMS)


  • 09-05-2017 [1.27] Matillion ETL for BigQuery, New Search System, Python Script Upgrades, UI Improvements and more
    • The Python Script component has been upgraded and now supports use of Jython (the default), Python 2 and Python 3. This is useful for customers who wish to use pip modules that are not pure python.
    • New Search system. Find jobs, notes and component properties anywhere in a project via a new Search tab.
    • An upgraded UI toolkit delivers a smoother, faster user experience.
    • Upgrades to the Task Panel add the ability for the user to:-
      • Multi-select tasks to cancel in the task panel.
      • Collapse all expanded items.
      • Remove all completed tasks.
    • Export Jobs now allows you to multi-select a choice of jobs in the job tree and export them.
    • The S3 Put component now allows the user to grab data from a self-signed HTTPS endpoint.
    • Google Cloud users – Also check out new Matillion ETL for BigQuery.
  • 13-03-2017 [1.26] EMR Load component, Youtube Query, Connection Manager, Java 8 Compatibility and more


    • A new Youtube Query Component.
    • A new EMR Load Component to make it easier to natively load EMR data sets.
    • Environment Explorer Tree shows UDFs, Primary Keys, Sort Keys, Distribution Keys.
    • Validation of Orchestration tasks now run in the background and appear in the Task panel (in the same way as Transformation tasks) this is more predictable, particularly for components that take longer to validate against 3rd party API’s.
    • A new Connection Manager allows you to see and control connected sessions. This will also prevent users from being locked out when they hit their connection limit.
    • All data-loading components now support a “Load Options” parameter to control:-
      • keeping the objects in S3 after the load completes for archive purposes.
      • Turning off automatic compression analysis.
      • Turning off automatic statistics gathering.
    • Users (using internal security) can be added/removed without requiring a restart.
    • The If component now logs the decision taken to the task panel. This will help users diagnose decision logic problems.
    • Matillion now runs on Java 8.
  • 18-01-2017 [1.25] Enhanced Encryption, Text Output Component, New Data Connectors and more

    New Data Connectors

    New Components

    • New Text Output orchestration component simplifies export to CSV and other Text based formats.
      • Similar to S3 Unload, with support for headers

    Other Changes

    • File Iterator now supports S3, you can loop over a list of files in an S3 bucket.
    • KMS Encryption option in password manager allows you to use AWS managed encryption keys to encrypt passwords in Matillion ETL.
    • Run Transformation / Run Orchestration  components now support variable overrides to make it easier to run jobs in a reusable manner.
    • Added support for the boolean data type.
    • The scheduling test will check your maintenance window and warn of possible overlaps.
    • Some orchestration components such as Create Table have an SQL Tab so it is easy to understand the generated SQL.
    • Additional methods available on Matillion date variables will simplify using dates in variables.
    • New cleaned up and simplified sample tab.
    • Hundreds of other tweaks, minor improvements and bug fixes.
  • 22-11-2016 [1.24] S3 Load Generator, Project Sharing, Chat Features, New Data Connectors and more


    New Data Connectors

    New Components

    • Delete Tables – Remove table such as temp tables as part of an Orchestration.
    • S3 Get Object – Get S3 Objects and push them SFTP, HDFS and Windows File Shares.
    • File Iterator – Iterate over a list of pattern matched objects in an  FTP, SFTP, HDFS or Windows Fileshare.

    S3 Load Generator

    • This tool helps generate compatible “Create Table” and “S3 Load” components by sampling delimited data files on S3 and guessing the layout.

    Project Sharing

    • Private projects can be created
    • Projects have an owner who controls which other users can collaborate

    Automated Backups

    • You may enable automated daily backups of the Matillion ETL for Redshift instance root volume

    New Chat and Presence Features

    • You can see who else is collaborating with you, and chat to them. Chats are persisted to provide context on your project

    Other Improvements

    • Create Table and Fixed Flow components support additional data types (Integer, Date). More to follow.
    • S3 Put Object now supports S3 as a source (in case you have ZIP files on S3 that need unpacking before loading to Redshift).
    • The SQL component can now be used at the beginning of a flow.
    • We now include an API profile for Matillion’s API to copy the run history to Redshift. The API Query component can be used to query this data and import to Redshift.

    Plus hundreds of minor improvement and bug fixes.


  • 23-09-2016 [1.23] Real-time validation, improved editor windows, searchable task history, component improvements and more

    New features
    • The Sample tab now allows filtering to assist debugging complex transformations.
    • Real-time validation of expressions in the Expression Editor
    • Your syntax is checked by Redshift as you type.
    • Jobs and folders in the explorer can be moved and copied in bulk.
    • Improved editor windows. You can see available variables and test your code without leaving the editor when writing Python and SQL Scripts.
    • Notes can now include bold, underlined and italic text, as well as hyperlinks.
    • The Task History is now searchable, and opens in a separate tab.
    • In the environment navigation browse the available tables, views and columns within each environment; drag and drop them into a Transformation.


    Component improvements
    • The S3 Load Component can specify an IAM Role ARN that is attached to your Redshift cluster.
    • On the Table Output Component “Analyze Compression” now supported an “If not compressed already” setting.
    • Python modules can now be installed with ‘pip’, and the latest boto3 API is now included by default for interaction with AWS services.
    • The RDS Bulk Output Component now supports output to Postgresql databases.
    • The S3 Put Component can now read directly from HDFS.
  • 02-08-2016 [1.22] Non-blocking task queue, profile editor, preview API, new data load components and more


    • Non-blocking task queue allows users to collaborate more seamlessly without being blocked by each others requests.
      • Multiple runs of the same job will queue.
      • All other runs may happen concurrently, regardless of the environment

    New Components

    • Load data from Hubspot with the HubSpot Query Component.
    • Load Odata Sources with the OData Query Component.
    • Load Microsoft Excel Spreadsheets with the Excel Query component.
    • Load Google AdWords data with the Google AdWords Query component.
    • SFTP Put Object component will allow you to write transformed data from Redshift back to an SFTP server.
    • Retry Component allows automatic retrying and backoff which is most useful for 3rd party API’s that are not 100% reliable.


    • S3 Put Object now supports copying a file from a Windows File Share.
    • You can now run an orchestration job from part way through.
    • Profile editor for bulding data profiles to describe how API’s map to tables and columns that can then be queried from the API Query Component
    • Import/Export can now include details of Variables and Environments
    • Notices/warnings/errors are now displayed on a new “Notices” tab.
    • Preview API to import/export entire projects, run jobs, monitor running jobs.
      • Can be used for integration to 3rd party source control management systems.
      • Ask support for more details on how to get started with this

    Plus hundreds of performance improvements and minor features.

  • 07-06-2016 [1.21] Project selection and organisation, enhanced user management, new data load components and more


    New Components

    •  Google Spreadsheets Query
    • Marketo Query
    • RDS Bulk Output (for Aurora, MySQL and MariaDB)
    • Bash Script
    • CloudWatch Publish

    Project Selection and Organisation

    • Open a job directly from the project chooser
    • Recently opened jobs are tracked
    • Project Selection is searchable
    • Create your own folder structure in the project tree

    Other Enhancements

    • Centrally Managed Passwords
    • Manage Users, Software Upgrades and more through a new Admin screen
    • Copy/Paste settings between spreadsheets/text files and the Grid Editor
    • SNS/SQS/RDS Components will offer Topics, Queues and Endpoints to choose from (if the given credentials allow it)

    Plus dozens of other minor improvements and fixes!

  • 22-04-2016 [1.20] Multi-threaded orchestration, new data load components and lots more

    Key Features
    • Concurrent execution of orchestration tasks (orchestration multi-threading).
    • New load components for Salesforce, MongoDb, DynamoDb, Google BigQuery, Netsuite, Microsoft Dynamics and LDAP.


    Other Features
    • S3 Put’ Zip Unpacking (unpack contents of a ZIP file to an S3 bucket from remote FTP/HTTP server).
    • Support for Views as data input in Transformation.
    • Support for Views as data input in Transformation. Support for Views as data output in Transformation.
    • Teradata support added to Database Query component.
    • RDS certificate support added to keystore functionality.


    Over 90 other minor improvements and bug fixes.

  • 26-02-2016 [1.19] Social media and API support. Commitment control. Improvement for SCDs.

    Key Features
    • New load components for Facebook, Twitter and Google Analytics.
    • New API Query component, loads data from any REST, JSON or SOAP API.
    • Commitment control functionality.


    Other Features
    • Detect changes component – for SCDs and real time updates
    • Improved task cancellation


    Over 100 other minor improvements

  • 15-01-2016 [1.18] Improved Orchestration flow control and component variables. Improved security support. Redshift UDF support.

    Key Features
    • New ‘If’ orchestration component
    • New component variable features set (inc. row count, duration, error handling, performance information)
    • Added S3 Manifest Writer component
    • Added Transpose Rows component
    • Added support for SSL
    • Support for Redshift User Defined Functions


    Other Features
    • Support for interleaved sorts
    • Encryption on S3 load
    • AVRO support for S3 load
    • Improved script editors to add syntax highlighting and auto-complete
    • DB2 in the database query component.


    Dozens more minor improvements

  • 30-11-2015 [1.17] Iteration, multi-schema support and FTP/HTTP data load functionality.
    Key Features
    • Multi-schema support
    • Iteration added to orchestration flows (table, list and fixed)
    • New S3 Put Component – load data from FTP/HTTP
    • New Python component


    Other Features
    • Manifest support in S3 loader added.


    Lots of other minor improvements

  • 21-10-2015 [1.16] SQS Integration, SQS Message, SNS Message and Load data from RDBMS
    Key Features
    • SQS Integration added
    • Added ability to nest orchestration jobs
    • New components added: SQS Message, SNS Message
    • Load data from RDBMS (Database Query component)


    Other Features
    • Manually or automatically set column encoding on Redshift tables
    • Cancel task feature added
    • New ‘Schema Copy’ component


    Dozens of other minor enhancements

  • 08-08-2015 [1.15] New scheduler. Load data directly from Amazon RDS. Improved UI.
    Key Features
    • New scheduler
    • RDS query component added,
    • UI enhancements (e.g. snap to grid)


    Other Features
    • Analyze/Vacuum/Truncate components added
    • New transformation components (Regex)


    Lots of other minor improvements and bug fixes.

  • 29-07-2015 [1.14] S3 load/unload. Improved performance and AWS integration.
    Key Features
    • New S3 Load/Unload components


    Other Features
    • Added ability to manually define AWS credentials after AMI Launch
    • Internal caching makes running the same jobs repeatedly much faster
    • New components can (i) update table rows, (ii) delete table rows, (iii) split a field on a delimiter


    72 other bug fixes and minor improvements