What's New & Latest Features | Matillion ETL for Amazon Redshift

What’s New

  • 04-10-2017 [1.29] High availability, new data load components, external table output and more

     

    • Matillion ETL for Amazon Redshift introduces the ability to configure Matillion ETL in a highly available topology with fully active-active cluster. This feature is only available on large and xlarge instance types.
      • Jobs run from SQS, the API or the built-in Scheduler will now fail-over in the event of an instance failure.
      • Scheduled runs missed because a server is offline will be run when it becomes available again.
      • Once two or more members are in the cluster, a Cluster Info tab shows membership status and activity.
      • OAuth tokens, Database Drivers and RSD Profiles are replicated via the persistence database (postgres).
      • Logging from each node is sent to Cloudwatch.
      • Cloudformation Templates help you get started with a clustered Matillion.
    • New Jira Query component loads data from Atlassian’s popular Software Development Platform.
    • New PayPal Query component can load payment and other data from Paypal Business accounts.
    • New ServiceNow Query component loads data from Servicenow’s IT Service Management (ITSM) platform.
    • New Stripe Query component loads data from Stripe’s payment platform
    • New Email Query component can query an IMAP based email system.
    • New YouTube Analytics component can query data from the YouTube Analytics API.
    • Excel Query can now load files from Google Cloud Storage, as well as Amazon S3
      • You only see S3 and/or GCS when you have credentials in the environment, otherwise they are hidden.
    • New option to drop a schema from the Environment Tree.
    • Specify a region in S3 Unload (to allow writing to buckets outside of the Redshift cluster’s region)
    • S3 / Google Cloud Storage file browser enhancements.
    • Set advanced connection options during OAuth flow (e.g. to connect to a Salesforce Sandbox)
    • Warning: Manage Backups and View Audit haven’t been removed, they have been moved to the Admin menu.
    • New External Table Output component – similar to Table Output but creates a Amazon Redshift Spectrum table over S3 data.
    • New “Add Partition” component (Amazon Redshift Spectrum only).
    • New “Delete Partition” component (Amazon Redshift Spectrum only).
    • Other Amazon Redshift Spectrum components have new Table Partitioning parameters.
  • 29-06-2017 [1.28] Amazon Redshift Spectrum Support, New Admin Menu, Enterprise Features and more

     

    • Amazon Redshift Spectrum support. You can now run SQL Queries in redshift directly against data sets in your S3 data lake in Text, Parquet, SequenceFile and other formats. Matillion ETL 1.28 introduces first-class support for all key Redshift Spectrum features and will allow users to combine Amazon Redshift Spectrum data with regular Redshift data in transformations. These include:
      • Components for creating an External Tables over S3 Data.
      • Rewrite External Table writes redshift data into S3 and defines an external table to reference it.
      • All data-staging orchestration components (all components ending in “query”) can write data to S3 and generate a compatible External Table to reference it. This will allow users to keep both small data sets and very large data sets in S3.
      • Amazon Redshift Spectrum schemas and tables are displayed in the Environment Tree.
    • Matillion no longer relies on creating views in Redshift to represent components.
      • Post upgrade we recommend using the “Delete Views” function on the Environment to remove existing views generated by Matillion. These will not be recreated and any other v_xxxxxxxxxx_xxxxxxxxxx views can be safely, manually removed.
    • A new Admin Menu allows administrators to:
      • Get the server log.
      • Update the Matillion server version.
      • Configure users (using either an internal user database or external directory server).
      • Configure SSL.
      • Note: This will replace the existing /admin application. This is currently retained on upgrade but will be removed in a future update. Please use the new Admin Menu where possible.
    • All transformation components support multiple outputs.
      • Separate Replicate component no longer mandatory.
    • Enterprise Features (these features are only enabled for users running m4.large or m4.xlarge instance types).
      • Automatic Job Documentation. Matillion ETL can automatically generate documentation for your ETL process. This tool will recursively search your jobs and include all job detail including linked notes and descriptions.
      • Auditing of User Actions with searchable Audit Log provides fine grained audit of every change to an ETL process.
      • Ability to use Matillion ETL with an external postgresql repository on RDS. Allows you to externalise all your Matillion Job and configuration data to RDS and take advantage or RDS features such as backups and point-in-time recovery. Please contact Matillion Support if you wish to take advantage of this.
    • Database Query now supports IBM Netezza data warehouses via JDBC.
    • S3 Server Side Encryption. Data written to S3 from any Query component, the S3 Put component or the S3 Unload component can now apply Server Side Encryption (SSE-S3 or SSE-KMS)

     

  • 09-05-2017 [1.27] Matillion ETL for BigQuery, New Search System, Python Script Upgrades, UI Improvements and more
    • The Python Script component has been upgraded and now supports use of Jython (the default), Python 2 and Python 3. This is useful for customers who wish to use pip modules that are not pure python.
    • New Search system. Find jobs, notes and component properties anywhere in a project via a new Search tab.
    • An upgraded UI toolkit delivers a smoother, faster user experience.
    • Upgrades to the Task Panel add the ability for the user to:-
      • Multi-select tasks to cancel in the task panel.
      • Collapse all expanded items.
      • Remove all completed tasks.
    • Export Jobs now allows you to multi-select a choice of jobs in the job tree and export them.
    • The S3 Put component now allows the user to grab data from a self-signed HTTPS endpoint.
    • Google Cloud users – Also check out new Matillion ETL for BigQuery.
  • 13-03-2017 [1.26] EMR Load component, Youtube Query, Connection Manager, Java 8 Compatibility and more

     

    • A new Youtube Query Component.
    • A new EMR Load Component to make it easier to natively load EMR data sets.
    • Environment Explorer Tree shows UDFs, Primary Keys, Sort Keys, Distribution Keys.
    • Validation of Orchestration tasks now run in the background and appear in the Task panel (in the same way as Transformation tasks) this is more predictable, particularly for components that take longer to validate against 3rd party API’s.
    • A new Connection Manager allows you to see and control connected sessions. This will also prevent users from being locked out when they hit their connection limit.
    • All data-loading components now support a “Load Options” parameter to control:-
      • keeping the objects in S3 after the load completes for archive purposes.
      • Turning off automatic compression analysis.
      • Turning off automatic statistics gathering.
    • Users (using internal security) can be added/removed without requiring a restart.
    • The If component now logs the decision taken to the task panel. This will help users diagnose decision logic problems.
    • Matillion now runs on Java 8.
  • 18-01-2017 [1.25] Enhanced Encryption, Text Output Component, New Data Connectors and more

    New Data Connectors

    New Components

    • New Text Output orchestration component simplifies export to CSV and other Text based formats.
      • Similar to S3 Unload, with support for headers

    Other Changes

    • File Iterator now supports S3, you can loop over a list of files in an S3 bucket.
    • KMS Encryption option in password manager allows you to use AWS managed encryption keys to encrypt passwords in Matillion ETL.
    • Run Transformation / Run Orchestration  components now support variable overrides to make it easier to run jobs in a reusable manner.
    • Added support for the boolean data type.
    • The scheduling test will check your maintenance window and warn of possible overlaps.
    • Some orchestration components such as Create Table have an SQL Tab so it is easy to understand the generated SQL.
    • Additional methods available on Matillion date variables will simplify using dates in variables.
    • New cleaned up and simplified sample tab.
    • Hundreds of other tweaks, minor improvements and bug fixes.
  • 22-11-2016 [1.24] S3 Load Generator, Project Sharing, Chat Features, New Data Connectors and more

     

    New Data Connectors

    New Components

    • Delete Tables – Remove table such as temp tables as part of an Orchestration.
    • S3 Get Object – Get S3 Objects and push them SFTP, HDFS and Windows File Shares.
    • File Iterator – Iterate over a list of pattern matched objects in an  FTP, SFTP, HDFS or Windows Fileshare.

    S3 Load Generator

    • This tool helps generate compatible “Create Table” and “S3 Load” components by sampling delimited data files on S3 and guessing the layout.

    Project Sharing

    • Private projects can be created
    • Projects have an owner who controls which other users can collaborate

    Automated Backups

    • You may enable automated daily backups of the Matillion ETL for Redshift instance root volume

    New Chat and Presence Features

    • You can see who else is collaborating with you, and chat to them. Chats are persisted to provide context on your project

    Other Improvements

    • Create Table and Fixed Flow components support additional data types (Integer, Date). More to follow.
    • S3 Put Object now supports S3 as a source (in case you have ZIP files on S3 that need unpacking before loading to Redshift).
    • The SQL component can now be used at the beginning of a flow.
    • We now include an API profile for Matillion’s API to copy the run history to Redshift. The API Query component can be used to query this data and import to Redshift.

    Plus hundreds of minor improvement and bug fixes.

     

  • 23-09-2016 [1.23] Real-time validation, improved editor windows, searchable task history, component improvements and more

    New features
    • The Sample tab now allows filtering to assist debugging complex transformations.
    • Real-time validation of expressions in the Expression Editor
    • Your syntax is checked by Redshift as you type.
    • Jobs and folders in the explorer can be moved and copied in bulk.
    • Improved editor windows. You can see available variables and test your code without leaving the editor when writing Python and SQL Scripts.
    • Notes can now include bold, underlined and italic text, as well as hyperlinks.
    • The Task History is now searchable, and opens in a separate tab.
    • In the environment navigation browse the available tables, views and columns within each environment; drag and drop them into a Transformation.

     

    Component improvements
    • The S3 Load Component can specify an IAM Role ARN that is attached to your Redshift cluster.
    • On the Table Output Component “Analyze Compression” now supported an “If not compressed already” setting.
    • Python modules can now be installed with ‘pip’, and the latest boto3 API is now included by default for interaction with AWS services.
    • The RDS Bulk Output Component now supports output to Postgresql databases.
    • The S3 Put Component can now read directly from HDFS.
  • 02-08-2016 [1.22] Non-blocking task queue, profile editor, preview API, new data load components and more

    Upgrades​

    • Non-blocking task queue allows users to collaborate more seamlessly without being blocked by each others requests.
      • Multiple runs of the same job will queue.
      • All other runs may happen concurrently, regardless of the environment

    New Components

    • Load data from Hubspot with the HubSpot Query Component.
    • Load Odata Sources with the OData Query Component.
    • Load Microsoft Excel Spreadsheets with the Excel Query component.
    • Load Google AdWords data with the Google AdWords Query component.
    • SFTP Put Object component will allow you to write transformed data from Redshift back to an SFTP server.
    • Retry Component allows automatic retrying and backoff which is most useful for 3rd party API’s that are not 100% reliable.

    Enhancements

    • S3 Put Object now supports copying a file from a Windows File Share.
    • You can now run an orchestration job from part way through.
    • Profile editor for bulding data profiles to describe how API’s map to tables and columns that can then be queried from the API Query Component
    • Import/Export can now include details of Variables and Environments
    • Notices/warnings/errors are now displayed on a new “Notices” tab.
    • Preview API to import/export entire projects, run jobs, monitor running jobs.
      • Can be used for integration to 3rd party source control management systems.
      • Ask support for more details on how to get started with this

    Plus hundreds of performance improvements and minor features.

  • 07-06-2016 [1.21] Project selection and organisation, enhanced user management, new data load components and more

    Upgrades

    New Components

    •  Google Spreadsheets Query
    • Marketo Query
    • RDS Bulk Output (for Aurora, MySQL and MariaDB)
    • Bash Script
    • CloudWatch Publish

    Project Selection and Organisation

    • Open a job directly from the project chooser
    • Recently opened jobs are tracked
    • Project Selection is searchable
    • Create your own folder structure in the project tree

    Other Enhancements

    • Centrally Managed Passwords
    • Manage Users, Software Upgrades and more through a new Admin screen
    • Copy/Paste settings between spreadsheets/text files and the Grid Editor
    • SNS/SQS/RDS Components will offer Topics, Queues and Endpoints to choose from (if the given credentials allow it)

    Plus dozens of other minor improvements and fixes!

  • 22-04-2016 [1.20] Multi-threaded orchestration, new data load components and lots more

    Key Features
    • Concurrent execution of orchestration tasks (orchestration multi-threading).
    • New load components for Salesforce, MongoDb, DynamoDb, Google BigQuery, Netsuite, Microsoft Dynamics and LDAP.

     

    Other Features
    • S3 Put’ Zip Unpacking (unpack contents of a ZIP file to an S3 bucket from remote FTP/HTTP server).
    • Support for Views as data input in Transformation.
    • Support for Views as data input in Transformation. Support for Views as data output in Transformation.
    • Teradata support added to Database Query component.
    • RDS certificate support added to keystore functionality.

     

    Over 90 other minor improvements and bug fixes.

  • 26-02-2016 [1.19] Social media and API support. Commitment control. Improvement for SCDs.

    Key Features
    • New load components for Facebook, Twitter and Google Analytics.
    • New API Query component, loads data from any REST, JSON or SOAP API.
    • Commitment control functionality.

     

    Other Features
    • Detect changes component – for SCDs and real time updates
    • Improved task cancellation

     

    Over 100 other minor improvements

  • 15-01-2016 [1.18] Improved Orchestration flow control and component variables. Improved security support. Redshift UDF support.

    Key Features
    • New ‘If’ orchestration component
    • New component variable features set (inc. row count, duration, error handling, performance information)
    • Added S3 Manifest Writer component
    • Added Transpose Rows component
    • Added support for SSL
    • Support for Redshift User Defined Functions

     

    Other Features
    • Support for interleaved sorts
    • Encryption on S3 load
    • AVRO support for S3 load
    • Improved script editors to add syntax highlighting and auto-complete
    • DB2 in the database query component.

     

    Dozens more minor improvements

  • 30-11-2015 [1.17] Iteration, multi-schema support and FTP/HTTP data load functionality.
    releases-python-script
    Key Features
    • Multi-schema support
    • Iteration added to orchestration flows (table, list and fixed)
    • New S3 Put Component – load data from FTP/HTTP
    • New Python component

     

    Other Features
    • Manifest support in S3 loader added.

     

    Lots of other minor improvements

  • 21-10-2015 [1.16] SQS Integration, SQS Message, SNS Message and Load data from RDBMS
    releases-116-sqs-images
    Key Features
    • SQS Integration added
    • Added ability to nest orchestration jobs
    • New components added: SQS Message, SNS Message
    • Load data from RDBMS (Database Query component)

     

    Other Features
    • Manually or automatically set column encoding on Redshift tables
    • Cancel task feature added
    • New ‘Schema Copy’ component

     

    Dozens of other minor enhancements

  • 08-08-2015 [1.15] New scheduler. Load data directly from Amazon RDS. Improved UI.
    Key Features
    • New scheduler
    • RDS query component added,
    • UI enhancements (e.g. snap to grid)

     

    Other Features
    • Analyze/Vacuum/Truncate components added
    • New transformation components (Regex)

     

    Lots of other minor improvements and bug fixes.

  • 29-07-2015 [1.14] S3 load/unload. Improved performance and AWS integration.
    Key Features
    • New S3 Load/Unload components

     

    Other Features
    • Added ability to manually define AWS credentials after AMI Launch
    • Internal caching makes running the same jobs repeatedly much faster
    • New components can (i) update table rows, (ii) delete table rows, (iii) split a field on a delimiter

     

    72 other bug fixes and minor improvements