- Blog
- 04.28.2025
Where to store your data: Amazon Redshift vs. S3
If you're building data pipelines on AWS, you’ve probably asked yourself, “Where should this data live?” When weighing up Amazon Redshift vs. S3, there are benefits to both, so the short answer? Your data should probably live in both.
Key Takeaways:
- Use Amazon S3 for cost-efficient, scalable storage.
- Use Redshift when performance and fast analytics matter.
- Use both when you need to scale storage and keep compute lean, Redshift Spectrum lets you query S3 without moving data.
Amazon S3 and Amazon Redshift serve different purposes, but each is increasingly complementary to the other. And thanks to tools like Redshift Spectrum and Matillion, you can now blur the lines between data lake and data warehouse to get the best of both worlds.
Matillion is a data integration platform that enables the orchestration and management of these hybrid pipelines seamlessly, whether working with data in S3, Redshift, or both.
In this article, we’ll break down when to use each, show how Redshift Spectrum fits in, and walk through an example using IoT data. You’ll also see how Matillion can orchestrate hybrid pipelines that give you flexibility without adding complexity.
S3 vs. Redshift: At a Glance
Before we dive in, here’s a quick side-by-side summary of Amazon Redshift vs. S3, to clearly display their core capabilities and differences.
| Feature | Amazon S3 | Amazon Redshift |
| What it does | Object storage service | Fully managed data warehouse |
| Best for | Storing raw, semi-structured or unstructured data | Working with large amount of structured data using SQL |
| Cost | Cheap storage worked out on a per-per-use basis | More expensive storage, but high-performance queries |
| Performance | High latency, slower queries (via Spectrum) | Optimized for analytical performance |
| Integration | Easy with many AWS services | Tight integration with BI & analytics tools |
Amazon Redshift vs. S3: Choosing The Right Storage Strategy
When deciding between S3, Redshift, or Spectrum, the key is to pick the best tool for the job at hand, rather than simply picking a ‘winner.’
With tools like Matillion, you can easily orchestrate hybrid pipelines that combine the best of each, allowing you to take advantage of both high-performance analytics and cost-effective storage. Let’s break down the strengths of each.
Amazon S3: Cost-Effective Storage at Scale
Amazon S3 is a great place to store raw data. It’s built to scale, supports any file format, and is pay-as-you-go. For log data, telemetry, sensor output, or data you don’t need to query often, it’s the obvious choice.
S3 offers cheap and efficient data storage, compared to Amazon Redshift. However, the storage benefits will result in a performance trade-off. This is because internal tables in Amazon Redshift work on data that has already been extracted and loaded into a table format.
Amazon Redshift: Performance for Analytics
Amazon Redshift is a high-performance data warehouse optimized for SQL-based analytics. It's great for structured data you want to slice, dice, and visualize fast.
You’ll typically want to load curated, transformed data into Redshift. This includes star schemas, aggregated metrics, and data you need to join frequently.
With Redshift, performance comes at a higher cost, but when you need fast dashboard loads or frequent joins, it pays off. While Redshift excels at performance, it’s not always the most cost-effective choice, and that’s where Redshift Spectrum comes in.
Redshift Spectrum: Query S3 Without Loading Data
Redshift Spectrum gives you the best of both worlds: you can run Redshift SQL queries on data that remains stored in S3. This is known as an external table. That means:
- No need to load data into Redshift tables
- Lower storage costs
- Seamless joins with other Redshift tables
This is especially useful for large historical datasets or infrequently accessed logs.
Tip: Aggregate and Filter in Spectrum, before loading into an internal table
Spectrum, which can be used with Matillion Data Productivity Cloud, is fantastic at filtering and aggregating very large datasets. The best performance comes from taking the load off Amazon Redshift. This means you should filter and aggregate in Spectrum before you start joining data, which can be handled in Amazon Redshift.
Real-World Example: IoT Data Flow
To see how this works in practice, let’s look at an IoT data scenario, where data flows from connected devices into the cloud.

1. Data collection and load to S3
Data is collected by devices, such as Amazon Alexa, Echo or Fire TV Stick, and streamed into S3 via Kinesis Firehose.
- ⇨ Why are we sending data to S3?
By staging the data in S3 and accessing it via Spectrum, there is no data loading time since the data stays on S3.
2. Store data in S3
The data can then be streamed to S3 and a bucket, which can then be read by Spectrum when we execute a job.
- ⇨ Why are we storing log data in S3?
S3 offers cheap and efficient data storage, compared to Amazon Redshift. However, the storage benefits will result in a performance trade-off. This is because the data has to be read into Amazon Redshift in order to transform the data.
3. Query and Combine via Matillion
Using Matillion Data Productivity Cloud with Amazon Redshift, you can create pipelines that:
- Read data from S3 (via Spectrum)
- Join with internal tables already in Redshift
- Transform the results - perhaps adding derivations, enrichment, or aggregation - and save the results into a new table
4. Create a New Table in Redshift
Matillion creates a table in Redshift with the results of the transformation and joins.
5. (Optional) Load your new table to S3
You can use Matillion’s Rewrite Table or Rewrite External Table components to push your output back to S3..
- ⇨ Tip: Partition your data!
Partitioning your data allows you to place sensible breakpoints, based on the data, that split up the data into logical chunks. This means a partition, as opposed to the full dataset, can be tackled by multiple nodes, improving processing times and reducing cost.
Redshift vs. S3: Where Matillion Fits In
In an era where hybrid data workflows are no longer the exception, rather they are the norm, having a way to effectively and efficiently manage said workflows is crucial.
And that is where Matillion comes in, enabling the seamless management of hybrid data workflows, by:
- Creating and orchestrating hybrid pipelines across S3, Redshift, and Spectrum
- Transforming data where it lives (in place or during load)
- Enabling teams to design pipelines with low/no code simplicity or full SQL control
So, Where Should Data Be Stored? Use Cases at a Glance
Choosing between S3, Redshift, or a hybrid approach isn’t always straightforward, it depends on the nature of your data and how you plan to use it. Here’s a quick guide to help you decide based on common scenarios:
| Use Case Description | Best Storage Approach | Why It Works Well |
| Archiving logs, staging raw or semi-structured data | S3 | Minimizes costs with pay-as-you-go object storage. Ideal for data you rarely query |
| Dashboards, curated data marts, frequent joins | Redshift | High-performance SQL engine for fast, complex analytics workloads |
| Querying large datasets without loading into Redshift | S3 + Spectrum | Keeps storage costs low while enabling direct querying from Redshift |
| Real Time | S3 + Spectrum | New data, for example logfiles or IoT records, are immediately available to queries against Spectrum external tables |
| Near Real Time | S3 + Matillion + Redshift | Matillion quickly orchestrates ingestion and transformations, loading near real-time updates into Redshift for faster analytics without manual intervention |
| Cost-effective storage with transformation/orchestration | S3 + Matillion | Use Matillion to orchestrate transformations without moving data unnecessarily |
| Repeatable, high-performance analytics pipelines | Redshift + Matillion | Combines a fast warehouse with powerful ETL orchestration and transformation tools |
Final Thoughts
There’s no one-size-fits-all answer, and that’s a good thing. With tools like Redshift Spectrum and Matillion Data Productivity Cloud, tailored solutions can be designed for any cost model or analytical need.
Want help building the right architecture for your workloads? Book a demo or start a free trial to see how Matillion makes hybrid data easier.
Want to see for yourself?
Book a demoFeatured Resources
Matillion Launches Maia's Migration Agent
New capability converts legacy ETL pipelines from 14 platforms to ...
Learn more NewsMatillion Appoints Tim O'Neil as Chief Revenue Officer
Learn more VideosThe Agentic Advantage Series: Part 3
Join John Tentomas, CEO of Nature’s Touch, as he shares how the team redesigned data engineering with AI agents in the loop.
Learn more
Share: