CSV, AVRO, JSON, Delimited and Fixed Width Text Integration to Amazon Redshift
- High performance data load
- Simple configuration
- Pull text files from S3, FTP, SFTP, HTTP/S, Windows File Share or HDFS
- Load/transform on arrival – as file lands in S3 bucket
- Create table schemas automatically from sampled text data
- Support bzip2, gzip and lzop compression
- Schedule data load and transformation
- Combine with other data, processes and services in an intuitive ELT interface
S3 Put and S3 Load Components
The S3 Put component allows you to copy data from any remote source – FTP, SFTP, HTTP/S, Windows File Share or HDFS – into an S3 bucket, as part of a Matillion orchestration job. Jobs can be scheduled, run on an event/queue and be orchestrated with other load and transform processes.
The S3 Load component allows you to load CSV, AVRO, JSON, Delimited and Fixed Width format text into an Amazon Redshift table as part of a Matillion integration job. Automatically define and create table schemas from sampled data. Set distribution and sort keys. Combine with other load and transform processes.
The S3 Unload and Text Output components allow you to export CSV, Delimited, Fixed Width and Escaped data from an Amazon Redshift table to an S3 bucket and from there to any remote source – FTP, SFTP, HTTP/S, Windows File Share or HDFS – as part of an orchestration job.
File Iterator, Manifest and scripting components complete the suite of tools to manage your text data load orchestration.
Features and Benefits
The S3 Put and S3 Load components in Matillion ETL for Amazon Redshift deliver fast data load performance and simple configuration, whilst being extensible to the most sophisticated data load and transform requirements.
- Fast – load and transform CSV, AVRO, JSON, Delimited and Fixed Width Text data into Redshift in a drag-and-drop interface
- Import and export data from/to FTP, SFTP, HTTP/S, Windows File Share or HDFS
- Flexible – Iterate through multiple files, build manifests, use events and schedules, add Python and BASH scripts
- Supports bzip2, gzip and lzop compression
- Implements Redshift best-practice. Define tables schemas automatically. Specify distribution style and sort keys
- Standalone data load or sophisticated integration – combine Oracle Database data with data from other databases and systems. Integrate with other AWS services including RDS, S3, DynamoDb, EMR, Kinesis, SQS and SNS
- Monitor load status. Comprehensive logging, audit and alerting features