How often do Rails apps get deployed?

How often do you deploy updates to your applications?  How do you compare to other Rails applications?

At New Relic, we deploy updates to production applications about once a week.  On average we do three more deploys during the week for bug fixes and optimizations.  We also have a staging server where we test out production candidates, and we deploy to that an average of about 30 times a week.

We were curious about how we compared to other applications so we decided to survey our customers for their deployment patterns.  RPM customers can do some additional configuration to keep track of deployments, allowing them to see in charts where deployments occurred, and depending on the product level, view the before and after statistics for the deployments.  It’s a very valuable feature when you are trying to figure out whether a change to the application caused some unexpected change to your performance profile.  Find out more.

Do frequent deployments lead to trouble?

Do frequent deployments lead to trouble?

We looked at the pattern of deployments to individual applications over the month of July for customers who have been tracking deployments since the beginning of the month.  There were quite a few that only deployed once.  The numbers went up to about 50 times per week, but the distribution was heavily weighted toward the less frequent deployments.  The median number of deployments per week was 2.25, but the average was 5.15 times per week.

Keep in mind this includes all applications and doesn’t distinguish production apps from staging apps.  The average number of applications being tracked for deployments was 1.5 per account (at New Relic, we track deployments for 3 applications).

If you look at the distribution, 87% of applications were deployed fewer than 10 times per week.  The rest were deployed up to 50 times per week.

What did we learn from this?  We learned we probably deploy at a higher rate than the average Rails application (we were above the median).  I’d guess the ones that are deploying more than 10 times a week are probably staging or development environments, and our staging environment falls into that category comfortably.

In general I think this shows pretty clearly how far we’ve come since the rigid lifecycle model dominated, deploying with a frequency you’d probably never have considered in the ‘old days’.  As software designers we use frequent updates as leverage responding to fast moving requirements and rapidly introducing fresh innovations.  As consumers we reap the rewards of frequent updates with websites that keep up a steady stream of really useful features, evident in applications like Pivotal’s Tracker and ENTP’s Lighthouse.

How often do you deploy?  To set up your Capistrano deploy.rb to track your deployments see our KB article on Capistrano integration.

The Unexpected Consequences of Under-provisioning Rails

There’s a common performance pattern in Rails applications and it looks like this:

The throughput shows an inverse relationship with response time.

The throughput shows an inverse relationship with response time.

In this example you can see request processing time increased significantly as throughput dropped off, to about 150 ms per request.  Users of the site, however, indicated that page display times were more like one to two minutes. What also makes this a bit unusual is that you expect to see higher response times with higher load (indicated by higher or maxed out throughput), not lower load.

Clearly the app took a hit for a couple of minutes, somewhere.  Burst of traffic?  That would cause throughput to increase, not fall by half.  It might be tempting to blame the back end. Something is amiss in the database and everything gets backed up. The database throughput drops, therefore so does the front end.

In reality that’s probably only half the explanation. For this pattern, a slowdown on the back end, such as in the database or a web service would likely explain the increase in overall response time on the front end. But the drop in throughput is due to a lack of capacity in the Rails tier. If you increase the number of mongrels or add Passenger instances then next time you will see less of a drop in throughput.

The scalability chart over the same time period gives a different view of the same pattern

The scalability chart over the same time period gives a different view of the same pattern

Here’s another perspective on what’s happening.  This image is the “Scalability Chart” in RPM and shows the response time plotted against throughput.  The cluster on the right half of the graph is the normal range of operation, showing a healthy website–as the throughput increases the response time shows slight linear growth.  But on the left half of the chart you see low throughput and high response times.

The insidious effect of this is that your user experience gets worse.  I don’t mean going from 50 ms to 300 ms.  I mean  from sub-second response to 15 seconds… 30 seconds… or more.

You can use Little’s Law to understand exactly what’s going on.

Let’s take an example where we see average processing time double from 200 ms to 400 ms, and throughput drops in half. Now let’s analyze the entire system, including the users . For a given user load, users are either waiting for a page to load, or thinking about the page. Let’s say they spend 15 seconds thinking, and the rest of the time waiting for a page to load. Let’s call the throughput (T) the rate at which a user completes the page load/think transaction. In that case, the number of transactions in the system (N) is equal to the number of users on the site while this is going on, and it’s fixed. The residence time (R) is the time spent thinking, plus the time spent waiting for the page to load. If the graph shows a drop in throughput, that means T is dropping. N is fixed, so R has to increase proportionally.

Let’s say the throughput dropped by half, similar to the situation we measured above.  This means R doubles–we know that applying Little’s law when N is fixed. Prior to the drop in throughput, R was 15 seconds of think time plus 200 ms waiting for the page to load. Doubling R means it is now 15 seconds (think time doesn’t change) plus 15.4 seconds to load the page. RPM won’t show that because that time is being spent outside Rails, such as in the Apache, mongrel, or the haproxy queue.

Now let’s get back to the cause.  So how is this a Rails capacity issue, and not just a slow database? Apply Little’s law now just to the Rails tier where you have a fixed number of instances. Each instance can only handle one request at a time so when they are all busy, the number of things in the system (N) is fixed at the number of Rails instances you allocated. But the turnaround time for each request (R) doubled because something happening in the database tier—let’s say a backup was running. You can’t increase N, so that means throughput has to drop by half, which means users are now waiting 15 seconds for a page, instead of about a half second.

So lets say you had allocated more Rails instances, so that generally more than half of them were idle. That would be 2 * N. When R doubled, then N doubled. No need for a drop in throughput. Users do see an increase in response time. But it’s from 200 ms to 400 ms, not 200 ms to 15 seconds! Big difference.

So it all boils down to this: when you run out of capacity in the application tier you’ll know because the throughput and response time move in opposing directions, and when you reach the point where your throughput is dropping, you can be sure your users are waiting much, much longer for pages than the time it is taking to process them.  Time to add instances!

Of course there are a number of simplifying assumptions made in these calculations, which I won’t go into now. Suffice it to say, when I’ve seen this pattern for applications monitored by RPM this is usually what’s going on.

A Little Formula for Analyzing Performance

Ever find yourself standing in checkout line at the grocery store, assessing all the factors determining how long you are going to wait, making calculations about whether you are in the fastest line or not? It’s pretty simple to figure out if you should switch lines.  You just look at the length of the line next to you, and the size of the carts in that line, multiply them together and then compare to your line.

How long of a wait is it going to be?

Photo courtesy of ChrisM70@flickr

It’s a basic calculation but it’s the same formula that is the basis for many different kinds of analysis of web applications.  I often find myself using the checkout line analogy to explain how I do performance tuning.  The principle is known as Little’s Law, and it relates the queue length, the total time spent in the checkout line, and the time spent checking each customer out.  It says the number of things in a system, N, is equal to the product of the rate at which things pass through the system, T, and the total time spent in the system, R.  Another version of it says that the time you should expect to spend checking out (R) is the time it takes that junior checker to process a cart full of groceries (1/T) multiplied by the number of people ahead of you (N).  You really didn’t need someone like Little to tell you that, did you?

I find people who spend a lot of time analyzing system statistics feel the same way.  They develop an intuitive sense of how the system responds under certain conditions, like increased load (N) or maxed out throughput (T).  Intuition makes a great beginning but a lousy conclusion*.  Performance management requires systematic, quantitative analysis.  Understanding how to use Little’s Law will allow you to reinforce intuition with evidence.  It can be applied to any problem independently, and to the system at different levels.  You just need to draw a box around the part of the system you are talking about and plug in the numbers.

For instance, consider a complete web application that has 1000 requests currently being processed or queued for processing.  We also know it’s processing at a rate of 200 requests per second.  The CPU is maxed out so we also know that it can’t increase that rate.  What happens if we suddenly double the load on the server?  Effectively you double N, the number of things in the system.  The throughput (T) is still fixed at 200 requests per second, so that means the response time (R) has to double, according to Little’s law.

Now narrow the scope.  Let’s draw the box around a connection pool. Let’s say you’ve got a DB connection pool limited to 5 connections, and that your average database operation takes 10 ms.  Is this connection pool big enough?  Let’s say you need to support 1000 transactions per second.  The question is will you ever have more than 5 requests in the pool?  Because that means somebody is waiting for a free connection.  From Little’s Law, N = 0.01 seconds (R) x 1000 requests/second (T), or 10.  You would need to double the size of the connection pool to accommodate that throughput comfortably (not strictly true, of course, unless the time between request arrival was constant).

The point is that you can refer to Little’s Law for just about any part of a system.  It also forms the basis for many other important laws used in analyzing web applications, including the Response Time and Utilization laws.  In this blog I’ll have lots more examples where we use Little’s law to analyze system performance.

Hopefully this at least gives you something to think about next time your standing in a long line at the checkout counter.