How to Manage Amazon Redshift Clusters Using Shared Jobs
In the world of analytics, the workload typically requires a data warehouse to be available 24 hours a day, 7 days a week. However, there may be times when you need an Amazon Redshift cluster for a short duration of time at frequent (or infrequent) intervals.
For example, you may run a periodic ETL job or use a cluster for testing and development and not use it during off-hours or weekends. In these cases, you may want an easy way to run the data warehouse part-time. Previously, you could accomplish this by making a backup, terminating the cluster, and restoring the cluster from the snapshot.
Manage Amazon Redshift Clusters Using Shared Jobs
Within Matillion ETL users can build Shared Jobs, which are reusable templates. A recent update to Amazon Redshift introduced some additional actions, in particular features for cluster management. To help you get started making use of these, we have created some Shared Jobs to provide a simple alternative to suspend billing if your Amazon Redshift cluster is out of operation for hours at a time, and especially if that time is on a regularly scheduled basis.
Let’s walk through the new Shared Jobs functionality available from within your Matillion ETL instance that provides quick and easy templates to control your AWS Redshift Cluster Management.
Setting up a new IAM Role
In order to make use of the available Shared Jobs, you will need to configure your IAM role to give Matillion ETL the required permissions to access your Amazon Redshift data warehouse.
An IAM Role allows your instances to use the Amazon API securely without manual management of security keys. This procedure assumes you do not already have an appropriate IAM role setup (if you do, simply select it and modify it if required).
- Click Create New IAM Role and click the blue Create New Role button:
2. Create an AWS Service with EC2 use case:
3. Create a policy with the following JSON then attach it to the IAM role:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "*" } ] }
4. You can attach the AmazonRedshiftFullAccess policy to it or you can edit the role to have bespoke permissions. The actions will need to be configured to what you intend Redshift to do. Here is a list of the recommend write actions:
- CancelResize
- CopyClusterSnapshot
- CreateCluster
- CreateClusterSnapshot
- CreateScheduledAction
- DeleteClusterSnapshot
- ModifyCluster
- ModifyClusterSnapshot
- ModifySnapshotSchedule
- PauseCluster
- RebootCluster
- ResizeCluster
- RestoreFromClusterSnapshot
- ResumeCluster
Now that you have set up your IAM role to work with Matillion ETL, we can switch back over to our Matillion ETL instance to review the Shared Jobs. If you are not familiar with Shared Jobs, make sure you check out our blog on “What is a Shared Job” to learn more.
Accessing Shared Jobs in Matillion ETL
We already provide some Matillion shared jobs already via the Shared Jobs panel to the lower left of the client > Read-Only > Matillion.
To assist our Matillion ETL for Amazon Redshift users, we have created new shared jobs, which you will be able to use if you are running on the latest version of Matillion ETL (v1.46) . You will see four new folders via the Shared Jobs panel. All new Shared Jobs are described below.
Redshift Scheduled Actions
Within this group we have added some Shared Jobs that are available to be used specifically with Amazon Redshift. They make use of the supported scheduled functions in Redshift. This gives you the ability to trigger a supported Amazon Redshift API operation on a schedule for a specific cluster from within Matillion.
The Redshift Schedule Actions are: (Redshift Scheduled Actions):
- Redshift – Schedule Resize Cluster – The ability to schedule a resize of a specific cluster via the Amazon Redshift API from within Matillion ETL
- Redshift – Schedule Pause or Resume – The ability for scheduling to ‘Pause’ or ‘Resume’ a specific cluster via the Amazon Redshift API from within Matillion ETL
- Delete A Scheduled Action – Delete a scheduled action
With these Shared Jobs, you will need to configure the variables to specify which cluster to manage, which action to apply and when to schedule it for. Then you will use the IAM role to run the scheduled action. This IAM role must have permission to run both:
- The Amazon Redshift API operation in the scheduled action
- The Amazon Redshift scheduler to assume permissions on your behalf. You can apply this by selecting the Redshift Scheduler use case (shown below)
Redshift Cluster Actions
Along with the available scheduled actions, there are several templated Shared Jobs that make use of some of the other actions in Amazon Redshift that can’t be scheduled. You can use these to manage your Redshift cluster from within Matillion ETL.
One of these Shared Jobs – Pause Resume or Reboot Cluster – covers several of the most common cluster actions. This shared job lets you run one of the following actions immediately on a given Redshift cluster from within Matillion:
- Pause-cluster
- Resume-cluster
- Reboot-cluster
There are three other Shared Jobs that use the the Redshift common actions to make it easy to manage your Redshift cluster from within Matillion ETL:
- Resize Cluster – You can change the size of the cluster, the cluster’s type, or change the number or type of nodes
- Cancel Resize Cluster – Cancel a resize operation for a cluster
- Create Cluster – Create a cluster with the minimal parameters
You will need to configure some of the variables to specify which cluster to manage and which action to apply.
Redshift Manage Usage Limits
Outside of the common actions, we have also introduced some Shared Jobs to help manage usage limits. You can define limits to monitor and control your usage and the associated cost of some Amazon Redshift features. You can create daily, weekly, and monthly usage limits, and define actions for Amazon Redshift to take automatically if those limits are reached. For example, from within Matillion you can direct Amazon Redshift to log an event to a system table to record usage exceeding your defined limits.
These shared jobs give you the ability to run one of the following actions on a given Redshift cluster from within Matillion:
- Create Usage Limit – Create usage limits on a cluster
- Modify Usage Limit – Modify a usage limit in a cluster
- Delete Usage Limit – Delete a usage limit from a cluster
Redshift Manage Cluster Snapshot
Finally, we have also added some jobs to assist with managing cluster snapshots. Snapshots are point-in-time backups of a cluster. You can take a manual snapshot any time. By default, manual snapshots are retained indefinitely, even after you delete your cluster. You can specify the retention period when you create a manual snapshot, or you can change the retention period by modifying the snapshot.
- Create Cluster Snapshot – Create a manual snapshot of a specified cluster
- Modify Cluster Snapshot – Modify the settings for a snapshot
- Delete Cluster Snapshot – Delete a specific manual snapshot
These new Shared Jobs within Matillion ETL allow you to simplify the administration and configuration of your Amazon Redshift clusters from within Matillion. Jobs can be scheduled to happen at a specific time in the future, or others can be called upon when running a job. We hope they help you better manage Amazon Redshift clusters and optimize your use of Amazon Redshift.
For more about the new features in Matillion ETL v1.46, including our new Assert components and other new features and improvements, check out our recent release blog.