The $10 supercomputer that’s revolutionising analytics

  • Richard Thelwell
  • June 24, 2015

supercomputer revolutionising analyticsAn article on Bloomberg Business the other day profiled data scientist Braxton McKee, who carries out complex artificial intelligence analyses for Wall Street hedge funds. It’s a job that involves building financial models with massive data frames—think a million rows by a million columns, for instance, totalling a trillion entries in all.

Yet Mr McKee aims to crunch that data in the time that it takes him to make a cup of coffee—and to do so, what’s more, on a supercomputer costing just $10.

And if you’re thinking that you can’t buy a supercomputer for $10, you’re exactly right. Especially one capable of crunching a trillion data points in the time that it takes to brew your favourite beans.

But you can rent time on just such a computer—which is exactly what Mr McKee does. And yes, the cost of that rental can be as low as $10.

Make me an offer

Now, let’s be clear. That isn’t ‘everyday low pricing’. Instead, it’s tantamount to auction-based pricing, buying surplus computing capacity at Cloud-hosting giant Amazon Web Services—a $5 billion business that’s part of retail giant

Because for data scientists running their analyses in Amazon’s so-called ‘elastic cloud’ (EC), it’s possible to submit bids to take up spare computing capacity that isn’t being used by the firm’s business customers.

amazon redshift supercomputer bid

So if one of Amazon’s powerful elastic cloud server clusters finds itself with nothing to do for a while, then a bid that’s a low as $10 will be accepted.

As Amazon itself puts it:

“Spot Instances allow you to name your own price for Amazon EC2 computing capacity. You simply bid on spare Amazon EC2 instances and they run whenever your bid exceeds the current Spot Price, which varies in real time, based on supply and demand.”

A $10 supercomputer? You’d better believe it. And what’s more, that $10 is a discount of around 90% on normal, typical, ‘On Demand’ pricing the sort of computing power such vast financial models require.

So what does this mean for business?

That depends on the kinds of reporting and analytics that a business wants to perform.

Would you want to have your regular day-to-day reporting carried out on this ‘Spot Instance’ basis, for instance? Probably not. Likewise with those everyday queries that need to be run, and dashboards that must be built.

As Amazon itself says, Spot Instance pricing is best suited to ‘time flexible and interruption tolerant tasks’—in other words, calculations where you want the eventual answer, but are flexible as to when you want it; and calculations can be stopped and started without impacting the analysis.

Periodic ‘Big Data’ analyses? Absolutely? Special customer analytics reports, such as basket and ‘recency’ analyses? Certainly. But day-to-day reporting? Almost certainly not.

Meet Amazon Redshift

Yet it’s here, in the realms of regular ‘On Demand’ computing, that things are getting really interesting.

Because Amazon has re-written the rules of data warehousing and Big Data analytics by combining columnar data store technology (also known as column oriented databases) with the massively parallel processing that’s made possible through clustering together its Cloud-based servers.

The result is Amazon Redshift. Amazon Web Services’ fastest-ever growing service right now, Amazon Redshift is a massively parallel columnar data store that can deal with billions of rows of data—but one that can be set up in a few minutes, and operated for a few cents an hour.

amazon redshift supercomputer cheaper

Now, here isn’t the place to explain in detail exactly how columnar data store technology works, and why you might want to use it. Let’s just say this: in contrast to traditional row orientated databases, the data store is more compact (there’s no index), and also blindingly fast—as the data acts as its own index.

Multiple nodes = raw horsepower

At which point we throw in linearly-scalable massively parallel processing. Meaning that a 16-node Amazon Redshift cluster will process queries approximately twice as fast as an 8 node Amazon Redshift cluster, which in turn will be approximately twice as fast as a 4 node cluster.

The result? Supercomputer-like performance, but for normal reporting and query-processing—without, needless to say, supercomputer-like pricing.

Does that sound like a revolution in reporting technology to you?

It does to us.

To find out more about how Amazon Redshift can revolutionise your analytics strategy, download our free eBook below