The Unexpected Consequences of Under-provisioning Rails

There’s a common performance pattern in Rails applications and it looks like this:

The throughput shows an inverse relationship with response time.

The throughput shows an inverse relationship with response time.

In this example you can see request processing time increased significantly as throughput dropped off, to about 150 ms per request.  Users of the site, however, indicated that page display times were more like one to two minutes. What also makes this a bit unusual is that you expect to see higher response times with higher load (indicated by higher or maxed out throughput), not lower load.

Clearly the app took a hit for a couple of minutes, somewhere.  Burst of traffic?  That would cause throughput to increase, not fall by half.  It might be tempting to blame the back end. Something is amiss in the database and everything gets backed up. The database throughput drops, therefore so does the front end.

In reality that’s probably only half the explanation. For this pattern, a slowdown on the back end, such as in the database or a web service would likely explain the increase in overall response time on the front end. But the drop in throughput is due to a lack of capacity in the Rails tier. If you increase the number of mongrels or add Passenger instances then next time you will see less of a drop in throughput.

The scalability chart over the same time period gives a different view of the same pattern

The scalability chart over the same time period gives a different view of the same pattern

Here’s another perspective on what’s happening.  This image is the “Scalability Chart” in RPM and shows the response time plotted against throughput.  The cluster on the right half of the graph is the normal range of operation, showing a healthy website–as the throughput increases the response time shows slight linear growth.  But on the left half of the chart you see low throughput and high response times.

The insidious effect of this is that your user experience gets worse.  I don’t mean going from 50 ms to 300 ms.  I mean  from sub-second response to 15 seconds… 30 seconds… or more.

You can use Little’s Law to understand exactly what’s going on.

Let’s take an example where we see average processing time double from 200 ms to 400 ms, and throughput drops in half. Now let’s analyze the entire system, including the users . For a given user load, users are either waiting for a page to load, or thinking about the page. Let’s say they spend 15 seconds thinking, and the rest of the time waiting for a page to load. Let’s call the throughput (T) the rate at which a user completes the page load/think transaction. In that case, the number of transactions in the system (N) is equal to the number of users on the site while this is going on, and it’s fixed. The residence time (R) is the time spent thinking, plus the time spent waiting for the page to load. If the graph shows a drop in throughput, that means T is dropping. N is fixed, so R has to increase proportionally.

Let’s say the throughput dropped by half, similar to the situation we measured above.  This means R doubles–we know that applying Little’s law when N is fixed. Prior to the drop in throughput, R was 15 seconds of think time plus 200 ms waiting for the page to load. Doubling R means it is now 15 seconds (think time doesn’t change) plus 15.4 seconds to load the page. RPM won’t show that because that time is being spent outside Rails, such as in the Apache, mongrel, or the haproxy queue.

Now let’s get back to the cause.  So how is this a Rails capacity issue, and not just a slow database? Apply Little’s law now just to the Rails tier where you have a fixed number of instances. Each instance can only handle one request at a time so when they are all busy, the number of things in the system (N) is fixed at the number of Rails instances you allocated. But the turnaround time for each request (R) doubled because something happening in the database tier—let’s say a backup was running. You can’t increase N, so that means throughput has to drop by half, which means users are now waiting 15 seconds for a page, instead of about a half second.

So lets say you had allocated more Rails instances, so that generally more than half of them were idle. That would be 2 * N. When R doubled, then N doubled. No need for a drop in throughput. Users do see an increase in response time. But it’s from 200 ms to 400 ms, not 200 ms to 15 seconds! Big difference.

So it all boils down to this: when you run out of capacity in the application tier you’ll know because the throughput and response time move in opposing directions, and when you reach the point where your throughput is dropping, you can be sure your users are waiting much, much longer for pages than the time it is taking to process them.  Time to add instances!

Of course there are a number of simplifying assumptions made in these calculations, which I won’t go into now. Suffice it to say, when I’ve seen this pattern for applications monitored by RPM this is usually what’s going on.

Advertisements

One response to “The Unexpected Consequences of Under-provisioning Rails

  1. Hi Bill,

    Great post. I think one of the simplifying assumptions that is really good to highlight is that although Little’s Law describes the relationship between metrics, it doesn’t predict what happens when you change them. Time spent in the system is R (response time or residence time), which is queue wait time plus service time. Service time is independent of the arrival rate, but queue wait time is not, and that makes the following statement not quite untrue, but not wholly descriptive either:

    “So lets say you had allocated more Rails instances, so that generally more than half of them were idle. That would be 2 * N. When R doubled, then N doubled. No need for a drop in throughput. Users do see an increase in response time. But it’s from 200 ms to 400 ms…”

    If the Rails instances were running about 50% utilization, the reader might infer that you could increase the utilization to 100% and the response time would simply double. This isn’t what would really happen, because the queue wait would go through the roof as the utilization increased. To compute the actual change in response time, you’d need to use the Erlang C function.

    It is true that when R doubles, then N doubles, all other things being equal, but I just wanted to point this non-linearity out so that your readers don’t extrapolate this to places it no longer holds.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s