Understanding Cloud Benchmarks

Lately there have been lots of benchmarks of various Cloud services floating around. It appears that it’s all the rage. There are plenty of things to take from these benchmarks, but I haven’t really seen anyone knit them together into a coherent view of what the Cloud means in terms of your web applications.

For this post, let’s take a look at Amazon’s EC2. We’ve used it here at Engine Yard, our friends at RightScale use it, and there’s generally a pretty good amount of information about it floating around the web. EC2 has a very distinct and interesting behavior profile. Let’s take a moment to see what consensus there is about that behavior. The salient points of most coverage I can find is roughly:

  1. EC2 is fairly variable in terms of absolute performance.
  2. EC2 is at least within the same ballpark as most commodity equipment.
  3. EC2’s EBS performs slightly less well than native disks.

I think everyone with any experience would have guessed the above. Machines are going to have little differences. Small differences in the randomized layout generated by randomized resource allocations have noticeable effects on the behavior of applications. EBS, being some sort of network attached disk, runs slower than a normal disk because it’s more than just a disk. All of this makes complete sense.

Given the above, the question is “Why are these benchmarks important?” Some more pedantic types would chime in that seeing what you expect on a benchmark is good confirmation. Others with a need-for-speed will claim that this is evidence that rolling your own infrastructure is always preferable because of the aggregate speed benefits. However, I think that these benchmarks only serve to show what is most important to Amazon. I also humbly suggest that what’s good for Amazon is good for you. At least, it should benefit you if your sites grow at all.

In general, these benchmarks show that EC2 is designed for making scalable applications. Their performance isn’t top-of-the-line, but it’s not abysmal either. An EC2 instance is appreciably slower than bare metal, but it’s instantly replaceable. EBS isn’t crazy fast, but it’s a portable, durable data store.

This is what most of the benchmarks out there are missing. In interpreting these benchmarks, I immediately realized that these are not the metrics that matter to scalable applications. Raw performance has at most an instantaneous, linear effect on your application. Every one of these metrics is only concerned with raw performance. As such, they’re not really useful.

A noteworthy suite of benchmarks would be done on metrics like “cost-to-grow per request.” Such metrics are slippery and difficult to nail down, but they are where EC2 really shines. It’s clear that the message from EC2’s benchmarks is that performance isn’t king, scalability is, and effective application architecture isn’t easily benchmarked. EC2 is designed for applications that can scale. Such applications don’t demand the highest performance, they demand moderate performance. They don’t demand the utmost high-availability, they demand tolerance of failures.

From that angle, it’s pretty clear that both camps are right, sort of. EC2 benchmarks exactly as we’d expect (making the pedants happy), and it’s built to deliver applications at the top end of throughput (theoretically making the speed-addicted happy). The catch is that there is a fundamental switch in how you view throughput. Rather than being about performance, it’s about aggregate performance (sometimes called scalability). That has everything to do with the type of applications on which Amazon was built.

So, how should we interpret the benchmarks of EC2? We should interpret that the Cloud is about building applications in different ways than previously possible. We should see that in a world where hardware is commoditized, squeezing out the last drops of performance plays second fiddle to adding hardware. We should notice that making applications tolerate failures is the new “high availability.”

In my opinion, it’s an interpretation that’s long overdue.