Frequently Asked Questions | Matillion ETL for Amazon Redshift

FAQ

 

General

What is Matillion ETL for Amazon Redshift?


Matillion ETL for Amazon Redshift is a push-down ETL/ELT tool for Amazon Redshift. It allows you to create powerful data transformation jobs, and orchestrate ELT processes, in a graphical tool. It includes all the functional and nonfunctional features you need to build and maintain enterprise data warehouses and analytical databases on Amazon Redshift, at scale. Delivered via the AWS Marketplace as an AMI, the product is browser-based and can be setup in under 5 minutes.

Who is Matillion ETL for Amazon Redshift for? Should I consider using this product?


Matillion ETL for Amazon Redshift is designed for people building, or planning to build, data warehouses and analytical databases on Amazon Redshift.  It suits:

  • data professionals, ETL developers and data architects.  Including those who have previously used ETL tools like Informatica, Pentaho, Talend, IBM Datastage or Microsoft SSIS
  • those who have been hand-coding data integrations in SQL and scripts
  • because the tool is easy to use, it also suits more business focussed users – data scientists and business analysts for example

What are the benefits of Matillion compared to traditional ETL tools e.g. Talend, Pentaho, Informatica?


For Amazon Redshift users, Matillion delivers some significant benefits when compared to traditional ETL tools. It’s much faster. It’s simpler to use, install and manage. It’s scalable- scaling is cheap and simple.  It’s usually cheaper – both to purchase and also Total Cost of Ownership.

Performance: Matillion is an ELT tool meaning it pushes down data transformation to the Amazon Redshift cluster. This is more efficient than a traditional ETL approach when using an analytical database like Amazon Redshift.

Scalability: Because Matillion pushes-down data transformations to Amazon Redshift, it’s as scalable as Redshift is. To go faster, or handle more data, simply increase the power of the Amazon Redshift cluster.

Simplicity: Matillion was specifically designed with the modern data professional in mind. The product is simple yet capable and includes all the features you need to develop data warehouses and analytical databases at scale. Because Matillion ETL for Amazon Redshift uses the Redshift cluster to do the work of data transformation, there’s no complex infrastructure required. You don’t need to add Matillion servers to make your jobs go faster or handle more data. Finally the product is delivered as an AMI via AWS Marketplace, so install into your AWS environment is simple, secure and takes less than 5 minutes.

Cost: Matillion saves you money in 3 ways. It’s cheaper than many competitive solutions. No extra infrastructure is required to run it. And finally, because it’s so fast, it saves you time in the development and maintenance of ETL/ELT jobs.

How does Matillion ETL for Amazon Redshift work?


Matillion is actually an ELT tool meaning it pushes down data transformation work to the Amazon Redshift cluster. This is more efficient than a traditional ETL approach when using an analytical database like Amazon Redshift.

As you create data transformation jobs in Matillion ETL for Amazon Redshift, the tool issues instructions (as SQL) to the Amazon Redshift cluster.  A useful metaphor is to think of Matillion as the conductor standing in front of the orchestra that is your Amazon Redshift cluster. This delivers high performance and scalability, without needing to scale Matillion’s infrastructure.  It also means the tool has some useful features, like being able to view data live at each stage of a transformation job, without needing to compile and run.

On top of the core data transformation capabilities, Matillion ETL for Amazon Redshift also adds orchestration, allowing you to choreograph transformations, data load, integration with other systems, cluster maintenance, error handling, iteration, logging, scheduling, version control, monitoring, collaboration and environment support.

How is Matillion ETL for delivered? Is it a SaaS product, a client install etc?


Matillion ETL for Amazon Redshift is delivered as an AMI via the AWS Marketplace. This means you launch your own instance of the product into your own VPC. It is not a SaaS product – you do not connect to Matillion’s servers to run Matillion ETL for Amazon Redshift and as such, no external firewall rules or whitelisting is required.

Once your instance is stood-up and configured, you and your team of ETL developers/data professionals access the tool using your web browser.  The UI is websockets based delivering a high fidelity experience without the need for a client install.

Under the hood the product is an Amazon Linux instance running Tomcat, a database (used to hold config information and saved jobs) and the Matillion software (a Java application). You launch it in the same way you would launch any other AMI, direct inside your EC2 console or from the AWS Marketplace.

The AWS Marketplace is a trusted storefront of applications, each tested and authorised by Amazon Web Services.  The only difference between a Marketplace AMI and a “normal” AMI is that the Marketplace AMI is tagged for billing. As such, billing for Matillion ETL for Amazon Redshift is via your existing AWS account.

Who are Matillion?


Matillion is a cloud software company based in New York, NY and Manchester, UK. We are an AWS Advanced Technology Partner, Amazon Redshift partner, AWS Marketplace partner and are one of only a handful of ISVs worldwide to hold the AWS Big Data Competency. We were founded in 2011 with customers globally.

 

Supported Databases/Sources

What databases/sources can I load data to and from using Matillion ETL for Amazon Redshift?


Matillion ETL for Amazon Redshift provides a comprehensive set of integrations to a range of databases, services and APIs. These include AWS core services, including Amazon S3, Amazon RDS, Amazon DynamoDb and Amazon EMR; on premise and Cloud databases including MySQL, MS SQL, Oracle, Sybase, PostgreSQL, DB2, DB2 for I and Teradata; FTP, HTTP, HDFS and Windows File Share; REST, JSON and XML APIs; Google BigQuery, MongoDb and Cassandra; social media platforms including Facebook, YouTube and Twitter; files including Microsoft Excel; marketing applications including Google Analytics, Google Adwords, Facebook Ads, Hubspot, Oracle Eloqua and Marketo; CRM platforms including Salesforce.com, Microsoft Dynamics CRM, SugarCRM and Netsuite CRM and internet standards including LDAP, OData.

You can also export/unload data from Redshift to a number of destinations: Amazon S3, MySQL, Amazon Aurora and Mariadb.

Find a complete list of integrations here.

Can I export data from Redshift to other databases using Matillion ETL for Amazon Redshift?


Using the S3 Unload component or the RDS Bulk Output component in Matillion ETL for Amazon Redshift you can push Amazon Redshift tables into Amazon S3, MySQL, Amazon Aurora and Mariadb. In this way Amazon Redshift can be used as the engine of your ETL process, with Amazon RDS or Amazon S3 as the destination for some or all of the finished data.

 

Integration

Can I call/invoke Matillion ETL for Amazon Redshift jobs from external applications?


Yes. You can invoke Matillion jobs and pass in parameters, from external applications, using Amazon SQS or using Matillion’s REST API.

Can I call/invoke external applications/scripts from Matillion ETL for Amazon Redshift jobs?


Yes. Matillion ETL provides components to use and call SQS, SNS, SQL and Python.

Can I use Matillion ETL for Amazon Redshift to run existing/new SQL I have developed separately?


Yes. There are SQL components available in Orchestration and Transformation jobs and you can call programs containing embedded SQL using SQS, SNS, Python or SQL.

Which core AWS services does Matillion integrate with/support?


Matillion ETL for Amazon Redshift integrates with and supports Amazon S3, Amazon RDS, Amazon SQS, Amazon SNS and Amazon CloudWatch. Using the Python or BASH components, Matillion ETL for Amazon Redshift can also interact with the AWS API.

 

 

Functionality

Can I use Matillion ETL for Amazon Redshift to build an Enterprise Data Warehouse?


Yes. Matillion ETL for Amazon Redshift was designed for this purpose and provides all the tools you need to build and maintain an EDW at scale – on its own, or in concert with other technologies.

Can I use Matillion ETL for Amazon Redshift to build a star schema?


Yes. Matillion ETL for Amazon Redshift contains all the components required to build and maintain star-schemas and other data warehouse forms.

Does Matillion ETL for Amazon Redshift support slowly changing dimensions?


Yes. You can see an example of building a slowly changing dimension in this support site article

What data transformation features does Matillion ETL for Amazon Redshift include?


Matillion ETL for Amazon Redshift includes a comprehensive array of data transformation components, each of which leverage underlying Redshift SQL. In short, everything you are likely to need to build and maintain data warehouses and analytical databases in Redshift.

You can see a full list of available transformation components here.

Does Matillion ETL for Amazon Redshift include a scheduler?


Yes. Matillion ETL for Amazon Redshift includes a scheduler which can run Orchestration jobs (which in turn, can run one or more other Orchestration jobs and Transformation jobs). The scheduler can also run external applications (via SQS, SNS or Python). With the Matillion scheduler and Orchestration jobs, you can fully automate your data warehouse ETL/ELT process, as well as choreograph other upstream and downstream activities.

If you have an existing scheduler, you can integrate Matillion ETL for Amazon Redshift into it using SQS or the REST API.

The scheduler includes comprehensive logging, error handling and alerting.

Does Matillion ETL for Amazon Redshift support version control?


Yes. Matillion ETL for Amazon Redshift includes comprehensive version control built in.

You can also use your own 3rd party version control system, such as Git, via Matillion’s REST API.

Can Matillion ETL for Amazon Redshift work across multiple Redshift schemas and clusters?


Yes. Matillion ETL for Amazon Redshift supports multiple schemas in individual jobs. So you can read from schema ‘a’, perform transformation on schema ‘b’ and output to schema ‘c’, in a single job.

Matillion ETL for Amazon Redshift provides the concept, ‘Environments’ where you can define one or more Amazon Redshift clusters, then choose which cluster your jobs run against e.g. Dev, Test, Live.

Environments also provide variables that can be reference in your jobs to allow for soft-coding of, for instance, table names, database connection details, S3 bucket/file names, parameters in Python or SQL, SQS queue names, SNS topic names (or anything else).

Can I deliver a real-time data warehouse/BI solution using Matillion ETL for Amazon Redshift?


Yes. In conjunction with a suitable mechanism for delivering your data either into S3 or into Redshift, you can use Matillion ETL for Amazon Redshift to deliver a near real-time data warehouse.

Does Matillion ETL for Amazon Redshift support the use of variables and soft-coding?


Yes. You can define variables and use them in any parameter or script in your jobs. Variables can be populated either from Orchestration components (e.g. read from a database table as part of a Table Iteration), from an external integration (an SQS message with variable key-value pairs) or manually configured into the Environment configuration.

Can Matillion ETL for Amazon Redshift co-exist with/make use of my existing SQL/Python/Bash scripts etc?


Yes. There are SQL components available in Orchestration and Transformation jobs and you can call programs containing embedded SQL using SQS, SNS, Python, Bash or SQL.

You can call Python (or other) programs using SQS or SNS and you can run or call Python from the Python component.

Which Redshift SQL features does Matillion ETL for Amazon Redshift support?


The goal and usual status is that Matillion ETL for Amazon Redshift supports all Amazon Redshift SQL features through the components it provides. Sometimes these are a direct componentization of a Redshift feature (e.g. Filter). Sometimes, more complex components leverage multiple Redshift SQL statements to deliver a particular function (e.g. Rank, Lead/Lag).

When new Redshift features are introduced by AWS, we add these into the product as fast as possible (usually, a few weeks). Where a component isn’t available for a Redshift command you want to use, the SQL component allows you direct access to use Redshift SQL.

You can view the SQL any Matillion component creates in the SQL tab for that component.

 

Performance

I have a lot of data. Can Matillion ETL for Amazon Redshift scale to handle billions of rows?


Yes. Matillion ETL for Amazon Redshift scales to very large data volumes and delivers incredible speed when compared to traditional ETL tools. The work of transforming data is actually done by Amazon Redshift itself, as Matillion is an ELT tool. As such, it’s a just as scalable as Amazon Redshift – so near linearly and to petabytes of data.

How fast is Matillion ETL for Amazon Redshift?


Fast.

Speed is important operationally (reducing ETL run times and allowing faster data refreshes). It’s also important at development time. In traditional ETL tools there’s a well established process of: build some of your job – run it to see if it works – wait for it to grind – find an error – repeat. This process is slow, so makes your development/maintenance slower than it needs to be. In Matillion ETL for Amazon Redshift, you can see data being processed in-job, in real time, so there’s no “build-test-wait to grind” cycle. We find this reduces ETL development and maintenance effort by around 50%.

How does Matillion ETL for Amazon Redshift scale? Do I need a bigger Matillion instance to process more data?


You do not need a bigger Matillion ETL instance to Transform more data, faster.  The actual work of transforming data is done by the Amazon Redshift cluster, so if you need to process more data faster in your transformation, you make your Redshift cluster bigger.

The Matillion instance size governs the maximum number of ETL developers/administrators that can concurrently use the tool and also the maximum number of saved projects/environments. Also, larger instance sizes can Load more data faster.  You can read about choosing the right instance size on this page.

 

Installation

How do I install Matillion ETL for Amazon Redshift?


To install Matillion ETL for Amazon Redshift, go to the AWS Marketplace (via aws.amazon.com/marketplace or via your EC2 console), find ‘Matillion ETL for Amazon Redshift’ and follow the instructions to launch.

The product is delivered as an AMI, straight into your AWS environment. It’s your instance running in your VPC.

You can go straight to our Marketplace page here and launch the product immediately. There is a free 14 day trial.

Note: Follow the instructions provided in ‘Launching the Product’ to ensure your instance is correctly setup. It’s a simple process which takes less than 5 minutes.

Note: If, for some reason, you cannot use the AWS Marketplace (e.g. you are in a locked-down environment) you can contact support@matillion.com and we can arrange a direct install.

Do I require dedicated hardware or AWS infrastructure for Matillion ETL for Amazon Redshift?


No. The product runs in a self-contained AMI which includes all the functionality you need to run the product. The only other thing you need is Amazon Redshift, though the product will also connect to and make use of Amazon S3, Amazon RDS, Amazon EMR, Amazon SQS, Amazon SNS and Amazon CloudWatch.

Which AWS services do I require to support Matillion ETL for Amazon Redshift?


There are no services required to run Matillion ETL for Amazon Redshift apart from the AMI (and consequently no Cloud Formation template or setup prerequisites are required either).

Of course it’s highly likely that you’ll want an Amazon Redshift cluster setup. This/these can be setup before or after you install Matillion.

 

Security/Backups

Is Matillion ETL for Amazon Redshift secure?


Yes. It runs inside your VPC, just like any other AMI. It’s scanned and audited by AWS before listing on the AWS Marketplace, to ensure it complies with AWS rules and is free from viruses and malware. It supports SSL connections to Amazon Redshift and source databases and you can configure user security for the product using an authentication mechanism of your choice.

Do I have to connect my systems to Matillion or share data with you, to use Matillion ETL for Amazon Redshift?


No. Matillion ETL for Amazon Redshift is delivered via AMI. You get your own dedicated instance running in your own EC2 account/VPC. You have root access to the instance that Matillion runs on and no-one (including Matillion or AWS) has any access to that instance.

How can I be sure your software is free of viruses/malware or malicious code, or code that allows you to see my data?


AMIs undergo both automated and manual scans, conducted by AWS, before being made available on the AWS Marketplace, to ensure compliance with AWS policies. The relevant policy includes:

AMIs MUST NOT contain any known vulnerabilities, malware or viruses. 2. AMIs MUST NOT contain default passwords, auth keys, key pairs, security keys or other credentials for any reason. All instance authentication must use key pair access rather than password based auth, even if the password is generated, reset or defined by the user at launch. 3. Marketplace AMIs must not allow password authentication. Disable password authentication via your sshd_config file by setting the PasswordAuthentication to no. 6.2. Accessibility 1. Linux-based AMIs MUST lock/disable root login and allow only sudo access through a user account (not “root”). Sudo allows you to control which users are allowed to perform root functions and logs the activity so that there is an audit trail. 2. AMIs MUST allow OS-level administration capabilities to allow for compliance requirements, vulnerability updates and log file access. For Linux-based AMIs this is through SSH, and for Windows based AMIs this is normally through RDP. 3. Linux-based AMIs MUST NOT have blank or null root passwords. 4. AMIs MUST NOT contain Authorized Passwords or Authorized Keys 5. AMIs MUST NOT use default passwords for user interface access. It is recommended to use a randomization process such as using the instance_id from the AWS EC2 Metadata Service. 6. Windows-based AMIs MUST a. Use the most recent version of Ec2ConfigService b. ENABLE “Ec2SetPassword”, “Ec2WindowsActivate” and “Ec2HandleUserData” c. Remove Guest Accounts or Remote Desktop Users (none are allowed) 7. The seller MUST NOT maintain access to the customer’s running instances. The customer has to explicitly enable any outside access, and any accessibility built into the AMI must be off by default.

 

Pricing/Commercial

How much does Matillion ETL for Amazon Redshift cost?

Matillion ETL for Amazon Redshift is cost effective and easy to pay for, hourly or annually, via your existing AWS account.  There are 3 different instance sizes to choose from, each of which supports a maximum number of concurrent users, a maximum number of saved environments and offers different performance.  Larger instances deliver higher performance, particularly data load performance.

You can pay for Matillion ETL for Amazon Redshift by the hour, or annually.  Hourly billing charges whenever the Matillion ETL for Amazon Redshift instance is started in your EC2 console and is ideal for proof of concepts, trials and projects where you do not need to use the instance all the time.  Annual customers pay a one-off annual subscription for unlimited use of the Matillion ETL for Amazon Redshift instance and enjoy direct access to Matillion’s support engineers via phone and e-mail.  Annual subscriptions are ideal for production workloads, as it works out cheaper than running an hourly instance 24 x 7. Also where a direct relationship is required or valuable.

Hourly prices start from $1.37 p/hour.  For more information on pricing, visit the pricing page.

 

Where do I buy Matillion ETL for Amazon Redshift?


You can launch and buy Matillion ETL for Amazon Redshift directly from the AWS Marketplace and billing is via your existing AWS account .  If you are unable to transact via AWS MArketplace, contact sales@matillion.com.

How do I pay for Matillion ETL for Amazon Redshift?


Billing for AWS Marketplace products, like Matillion ETL for Amazon Redshift, is via your existing AWS account. You do not need to re-enter credit card details. Matillion ETL for Amazon Redshift will appear as a line item on your AWS invoices, as for any other AWS service.

What instance size do I need?


You can choose an instance size based on your requirements. You can read about choosing the right instance size on this page, but in shorthand: t2.medium supports teams of 1-2 ETL developers (ideal for POCs). m4.large supports teams of up to 5 ETL developers and m4.xLarge is for teams of up to 12 ETL developers. Larger instances also load data at faster rates.

How many instances do I need?


You only need one instance of Matillion ETL for Amazon Redshift (and one instance can connect to multiple Amazon Redshift clusters), however you may choose to use more than one instance, for example to support two separate projects, or an instance for development and another for production. Purchase extra instances as required via the AWS Marketplace, either hourly or annually.

If you’re unsure, contact support@matillion.com for advice.

 

Support

What support is available with the product?


Matillion provide comprehensive support to our Free Trial, Hourly and Annual Customers including monitored forums, detailed documentation and component guides, quick start guides, FAQs, and instructional videos. Free Trial and Annual customers may also get in touch via email and phone.

Access the support portal here or email support@matillion.com.  Read more about services and support here.

Read what Matillion customers have to say about our support and product on our AWS Marketplace reviews page.

I need assistance using the product. Do you offer services?


Yes. We have a services team available for training, setup, proof-of-concepts and full builds.  Read more about services and support here.

Is there a service partner network for Matillion ETL for Amazon Redshift?


Yes, if you would like to be introduced to a Matillion services partner, or are interested in becoming a partner, e-mail us at sales@matillion.com