Amazon Redshift Serverless Integration with Matillion

Amazon Redshift Serverless Integration with Matillion

Amazon Redshift Serverless integration with Matillion enables users to build and orchestrate scalable data pipelines without the need to manage Redshift infrastructure, combining the flexibility of serverless architecture with the power of low-code data transformation.

Key Takeaways:

  • Redshift Serverless eliminates manual cluster management and scales compute dynamically.
  • Namespaces and Workgroups structure the Serverless model, controlling compute and access.
  • Matillion seamlessly integrates with Redshift Serverless, enabling scalable ETL workflows.
  • Network best practices include deploying Redshift in private subnets and configuring security groups properly.
  • IAM roles are required for Redshift Serverless to interact with AWS services like S3.
  • Matillion requires proper credentials to connect, including Workgroup endpoints and AWS permissions.With Redshift Serverless:
  • Compute and storage are decoupled.
  • The compute layer automatically scales resources based on workload demand.
  • The storage layer leverages AmazonS3 for durable and scalable data storage.
  • Users only pay for compute when it is actively in use.

What is Amazon Redshift Serverless?

Amazon Redshift Serverless is a cloud-based data warehousing solution that eliminates the need for manual provisioning and cluster management. For AWS, this is a significant departure from the traditional Redshift Provisioned model.

With Redshift Provisioned:

  • Compute and storage are tightly coupled.
  • The compute layer requires manual node selection and does not scale automatically.
  • The storage layer is attached directly to the cluster.
  • Users pay for compute continuously, even when idle.

Redshift Serverless v Redshift Provisioned: Key Differences

FeatureRedshift ServerlessRedshift Provisioned
ComputeScales automaticallyRequires upfront node selection
StorageAmazon S3Attached to the cluster nodes
CostPay-per-use compute
Separate storage costs
Continuously charged when running
ManagementFully managed with no infrastructure provisioningRequires cluster sizing and management
Startup TimeInstant scalingClusters can be powered on/off but with some latency

To introduce a serverless mode alongside classic Redshift Provisioned, AWS introduced the concepts of Namespaces and Workgroups.

  • Namespaces are logical containers that group database objects and users, as well as schemas, tables, data shares, and snapshots. This allows for resource organization and management. A namespace represents the data managed by Redshift Serverless.
  • Workgroups are a set of compute resources that execute queries. They can be configured with a base capacity and scaling options to meet changing workload requirements. Workgroups have network and security settings that can be configured for role-based access control. A workgroup represents the compute power that gives access to a Namespace.

To connect to Amazon Redshift Serverless, you must choose a Workgroup that has been associated with a Namespace. Together this is how these two concepts provide resource management and access control.

Setting up Matillion for Redshift Serverless

Configuring Workgroups and Namespaces

To successfully connect, ensure:

  • A Workgroup is deployed and linked to a Namespace.
  • Your Matillion instance is in the same AWS region as Redshift Serverless.
  • The correct security configurations are in place (covered below).

Redshift Serverless Subnet Best Practices

Network Configuration for Redshift Serverless

A Redshift Serverless Workgroup must be deployed into at least three subnets. There is no need to expose the Workgroup to the internet, so these can all be private subnets. A NAT Gateway is optional, and you don't need it if your private subnets have a local route to the rest of the rest of the VPC.

Protect your Workgroup with a Network Security Group that permits your Matillion infrastructure to communicate with it on the designated port.

Matillion Network Best Practices

  • For Data Productivity Cloud users: Within the same VPC, place your Matillion Hybrid SaaS agent either in a public subnet, or a private subnet with a NAT Gateway. It needs to initiate outbound communication with Matillion's control plane infrastructure.
  • For Matillion ETL users: Instances can be placed in a public subnet, with an Internet Gateway, or behind a load balancer. They need to be accessible so Matillion ETL users can log in.

Setting Up Redshift Serverless: Subnet Architecture Guidelines

Redshift Serverless Matillion Environment Credentials

AWS automatically provides a database superuser named admin with Redshift Serverless. 

While you can connect using this user, it has excessive privileges. Instead, it's recommended to create a dedicated user with permissions only for the necessary database.

When connecting to Redshift Serverless you will need:

  • Endpoint: The network address of your Workgroup. Copy this from the Workgroup configuration panel in your Redshift Serverless console. Excludejdbc:redshift:// part at the beginning, and:5439/dev at the end.
    • For example wg-name.123456789012.eu-west-1.redshift-serverless.amazonaws.com
  • Port: just the numeric port number allocated to the Workgroup.
    • For example 5439
  • Username and Password: As configured in your Redshift Serverless environment.
  • AWS credentials (via secret keys or an instance profile attached to your EC2 or ECS compute) that allow:
  • Access your S3 buckets
  • Use of redshift:DescribeClusters, to check available IAM roles. (If no roles appear in Matillion’s dropdown, you can manually override the value.)

Redshift Serverless Namespace Permissions

Amazon Redshift Serverless requires permission to access AWS services such as S3. The recommended method is assigning an IAM role to the Namespace. 

Configuring IAM Roles for Matillion Integration

  1. Create an IAM Role using AWS IAM.
  2. Attach the following permissions:
    1. a trust relationship with redshift-serverless.amazonaws.com and redshift.amazonaws.com
    2. The AmazonRedshiftAllCommandsFullAccess policy.
    3. Any additional policies required (e.g., for S3 access)

Testing IAM Role Permissions

To test that the permissions are working, run a COPY command like the one below to load some CSV data from S3 into a database table:

COPY "stg_mydata"
FROM 's3://bucket-name/data/csv/mydata.csv'
iam_role 'arn:aws:iam::123456789012:role/redshift-serverless-role'
FORMAT AS CSV
IGNOREHEADER 1

The same IAM role ARN can be used in the IAM Role ARN property of your Matillion S3 Load components.

Conclusion

Amazon Redshift Serverless gives your data processing workloads instant access to highly scalable compute power, which you only pay for while it's actually running.

As an Amazon Redshift technology partner, Matillion seamlessly integrates with AWS and Redshift Serverless. The Data Productivity Cloud additionally interfaces directly with LLMs, democratizing access to AI for all your data needs.

Discover the power of Matillion's Ultimate ETL with flexible pricing options designed to match your strategic ambitions. Start your journey today with a free trial or book a personalized demo to see how Matillion can become your invaluable data asset.

Ian Funnell
Ian Funnell

Data Alchemist

Ian Funnell, Data Alchemist at Matillion, curates The Data Geek weekly newsletter and manages the Matillion Exchange.
Follow Ian on LinkedIn: https://www.linkedin.com/in/ianfunnell

Get started today

Matillion's comprehensive data pipeline platform offers more than point solutions.