Rails and Merb Merge: Performance (Part 2 of 6)

The next significant improvement that we hoped to bring to Rails from Merb was faster performance. Because Merb came after Rails, we had the luxury of knowing which parts of Rails were used most often and optimizing performance for those parts.

For Rails 3, we wanted to take the performance optimizations in Merb and bring them over to Rails. In this post, I'll talk about just a few of the performance optimizations we've added to Rails 3: reducing general controller overhead and (greatly) speeding up rendering a collection of partials.

For our initial performance work, we focused on a few specific but commonly used parts of Rails:

  • General overhead (the router plus the cost of getting in and out of a controller)
  • render :text
  • render :template
  • render :partial
  • rendering a number (10 and 100) of the same partials in a loop
  • rendering a number (10 and 100) of the same partials via the collection feature

This was definitely a limited evaluation, but it covered most of the cases where performance might be at a premium and the Rails developer was unable to do anything about it.

General Controller Overhead

The first thing was improving the general overhead of a Rails controller. Rails 2.3 doesn't have any way to test this, because you're forced to use render :string to send back text to the client, which implicates the render pipeline. Still, we wanted to reduce it as much as possible.

When doing this work, we used Stefan Kaes' fork of ruby-prof that comes with the CallStackPrinter (the best way I've ever seen to visualize profile data from a Ruby application.) We also wrote a number of benchmarks that could double as profile runs if I wanted to zero in and get more precise data.

When we looked at overhead, it was dominated by setting the response. Digging a bit deeper, it turned out that ActionController was setting headers directly, which then needed to be re-parsed before returning the response to get additional information. A good example of this phenomenon was in the Content-Type header, which had two components (the content-type itself and an optional charset). The two components were available on the Response object as getters and setters:

def content_type=(mime_type)
  self.headers \["Content-Type"\] =
    if mime_type =~ /charset/ || (c = charset).nil?
      "#{mime_type}; charset=#{c}"

# Returns the response's content MIME type, or nil if content type has been set.
def content_type
  content_type = String(headers \["Content-Type"\] || headers \["type"\]).split(";") \[0\]
  content_type.blank? ? nil : content_type

# Set the charset of the Content-Type header. Set to nil to remove it.
# If no content type is set, it defaults to HTML.
def charset=(charset)
  headers \["Content-Type"\] =
    if charset
      "#{content_type || Mime::HTML}; charset=#{charset}"
      content_type || Mime::HTML.to_s

def charset
  charset = String(headers \["Content-Type"\] || headers \["type"\]).split(";") \[1\]
  charset.blank? ? nil : charset.strip.split("=") \[1\]

As you can see, the Response object was working directly against the Content-Type header, and parsing out the part of the header as needed. This was especially problematic because as part of preparing the response to be sent back to the client, the Response did additional work on the headers:

def assign_default_content_type_and_charset!
  self.content_type ||= Mime::HTML
  self.charset ||= default_charset unless sending_file?

So before sending the response, Rails was once again splitting the Content-Type header over semicolon, and then doing some more String work to put it back together again. And of course, Response#content_type= was used in other parts of Rails, so that it was correctly set based on the template type or via respond_to blocks.

This was not costing hundreds of milliseconds per request, but in applications that are extremely cache-heavy, the overhead cost could be larger than the cost of pulling something out of cache and returning it to the client.

The solution in this case was to store the content type and charset in instance variables in the response, and merge them in a quick, simple operation when preparing the response.

attr_accessor :charset, :content_type

def assign_default_content_type_and_charset!
  return if headers \[CONTENT_TYPE\].present?

  @content_type ||= Mime::HTML
  @charset      ||= self.class.default_charset

  type = @content_type.to_s.dup
  type < < "; charset=#{@charset}" unless @sending_file

  headers \[CONTENT_TYPE\] = type

So now, we're just looking up instance variables and creating a single String. A number of changes along these lines got overhead down from about 400usec to 100usec. Again, not a huge amount of time, but it could really add up in performance-sensitive applications.

Render Collections of Partials

Rendering collections of partials presented another good opportunity for optimization. And this time, the improvement ranked in milliseconds not microseconds!

First, here was the Rails 2.3 implementation:

def render_partial_collection(options = {}) #:nodoc:
  return nil if options[:collection].blank?

  partial = options[:partial]
  spacer = options[:spacer_template] ? render(:partial => options[:spacer_template]) : ''
  local_assigns = options[:locals] ? options[:locals].clone : {}
  as = options[:as]

  index = 0
  options[:collection].map do |object|
    _partial_path ||= partial ||
      ActionController::RecordIdentifier.partial_path(object, controller.class.controller_path)
    template = _pick_partial_template(_partial_path)
    local_assigns[template.counter_name] = index
    result = template.render_partial(self, object, local_assigns.dup, as)
    index += 1

The important part here is what happened inside the loop, which could occur hundreds of times in a large collection of partials. Here, Merb had a higher performance implementation which we were able to bring over to Rails. This is the Merb implementation.

with = [opts.delete(:with)].flatten
as = (opts.delete(:as) || template.match(%r[(?:.*/)?_([^\./]*)]) \[1\]).to_sym

# Ensure that as is in the locals hash even if it isn't passed in here
# so that it's included in the preamble.
locals = opts.merge(:collection_index => -1, :collection_size => with.size, as => opts \[as\])
template_method, template_location = _template_for(
  opts.delete(:format) || content_type,

# this handles an edge-case where the name of the partial is _foo.* and your opts
# have :foo as a key.
named_local = opts.key?(as)

sent_template = with.map do |temp|
  locals \[as\] = temp unless named_local

  if template_method && self.respond_to?(template_method)
    locals[:collection_index] += 1
    send(template_method, locals)
    raise TemplateNotFound, "Could not find template at #{template_location}.*"


Now this wasn't perfect by a long shot. There was a lot going on here (and I'd personally like to have seen the method refactored). But the interesting part is what happened inside the loop (starting from sent_template = with.map). Unlike ActionView, which figured out the name of the template, got the template object, got the counter name, and so on, Merb limited the activity inside the loop to setting a couple of Hash values and calling a method.

For a collection of 100 partials, this could be the difference between overhead of around 10ms and overhead of around 3ms. For a collection of small partials, this could be significant (and a reason to inline partials that were appropriate to be partials in the first place).

In Rails 3, we've improved performance by reducing what happens inside the loop. Unfortunately,there was a specific feature of Rails that made it a bit harder to optimize this generically. Specifically, you could render a partial with a heterogenous collection (a collection containing Post, Article and Page objects, for instance) and Rails would render the correct template for each object (Article objects render _article.html.erb, etc.). This means that it was not always possible to determine the template to render up front.

In order to deal with this problem, we haven't been able to optimize the heterogenous case completely, but we have made render :partial => "name", :collection => @array faster. In order to achieve this, we split the code paths, with a fast path for when we knew the template, and a slow path for where it had to be determined based on the object.

So now, here's what rendering a collection looks like, when we know the template:

def collection_with_template(template = @template)
  segments, locals, as = [], @locals, @options[:as] || template.variable_name

  counter_name  = template.counter_name
  locals \[counter_name\] = -1

  @collection.each do |object|
    locals \[counter_name\] += 1
    locals \[as\] = object

    segments < < template.render(@view, locals)

  @template = template

Importantly, the loop is now tiny (even simpler than what happened in Merb inside the loop). Something else worth mentioning is that in improving the performance of this code, we created a PartialRenderer object to track state. Even though you might expect that creating a new object would be expensive, it turns out that object allocations are relatively cheap in Ruby, and objects can provide opportunities for caching that are more difficult in procedural code.

For those of you want to see the improvements in pictures, here are a few things to look at: first, we have the improvement between Rails 2.3 and Rails 3 edge on Ruby 1.9 (smaller is faster).

And here it is for more expensive operations:

Last we've got a comparison of Rails 3 across four implementations (Ruby 1.8, Ruby 1.9, Rubinius, and JRuby):

You can see that Rails 3 is significantly faster than Rails 2.3 across the board, and that all implementations (including Rubinius!) are significantly improved over Ruby 1.8. All in all, a great year for Ruby!

Next post, I'll talk about improvements in the Rails 3 API for plugin authors—keep an eye out, and as always, leave your comments!