We had noticed that occasionally, requests were a tad slower than at other times. We've been tuning here, and tuning there, and each little twist made a little difference.
I'm a big believer in tuning systems over time, and don't like to twist each and every knob available to me when using a new piece of hardware, because it doesn't allow you to understand why the company that built the product chose to ship it the way they did.
Today, I spent some time digging into the algorithms our load balancers use to distribute requests between slices.
As it turns out, the default algorithm the load balancer uses is rather naive.
I think that our vendor defaulted to the simplest algorithm, one that new users could immediately grasp: Deliver requests to each slice serially in round-robin fashion. This make it easy to determine that the load balancer is working, because all systems will get requests in a predictable and orderly fashion.
However, it's far from ideal. Imagine a particularly resource intensive page on your site. If a slice is already serving one of those expensive pages, why ask it to serve another? It makes more sense to ship it to another slice instead.
The load balancer doesn't really know the page is expensive, of course, but it can see the side effects of something like that, by noticing the response time. If a server is a bit slow on requests, new requests will get immediately routed to slices that are not bogged down. Very cool!
In any case, all customers will immediately see an improvement in average response time, and that's a good thing for everyone!