The Queue API in Rails 4 is supposed to be an abstraction layer for background processing. It ships with a basic implementation, but developers are expected to swap out the default backend with something more production ready like Resque or Sidekiq. This standardization should then allow Rails plug-ins (and Rails itself) to perform work asynchronously where it makes sense without having to worry about supporting all of the popular backends.
In preparation for my talk titled "How to fail at Background Jobs", I've been following activity on the Rails 4 Queue API. Recently, the Queue API was removed from the master branch, and pushed off until Rails 4.1 at the earliest.
What follows is my third party attempt to report on why. My main source of information is this commit on GitHub, but I'll also attempt to draw some conclusions based on my own experience with queueing systems.
Another interesting source of information comes in the comments with the very first commit to add a Queueing API to Rails: https://github.com/rails/rails/commit/adff4a706a5d7ad18ef05303461e1a0d848bd662
Basically, I see three failures with the Queue API as currently implemented in the "jobs" branch: https://github.com/rails/rails/tree/jobs
1. The API
The API as implemented is a "nice idea", but it's actually very un-Rails-like when compared to things like ActiveRecord.
Here's an example of enqueuing a job in the existing implementation:
For illustrative purposes only, a more Rails-like API might look like this:
The name of the queue should be a concern of the job (not the place that enqueued it). Imagine if you wanted to change the queue name, you'd have to change every enqueue-ing place to reference the new name.
Also, notice we have to do a little extra work in our implementation of run to fetch the user by ID instead of having our queuing system Marshall it for us. This leads me to the next failure...
2. Marshall vs. JSON
Jobs are generally run in a different process from where they were enqueued. This means serialization. The simplest choice for doing this would seem to be Ruby's built in Marshall. Rails took the approach of Marshalling an entire job class, while most other libraries use serialize the job arguments and job class name to JSON. It's a best practice in most other systems to store as little information as possible information about the job in the queue itself. A queue is an ordering system, information should be stored in a database.
The Marshall approach is a slightly nicer API for the developer, but quickly breaks down in practice. Care must be taken not to Marshall objects with too many relationships to other objects or Procs (which cannot be Marshalled in Ruby unless you are using the niche implementation: MagLev).
Finally, Marshalling is not as nice for Ops. Monitoring a running queue in production is much easier when you can easily inspect the contents of jobs. JSON is a much more portable format.
3. Solving the Wrong Problem
It seems one of the major goals of the Rails 4 Queue is to always send e-mails in the background. We could debate whether action_mailer really belongs as part of a Model-View-Controller framework in the first place, but I digress.
Let me re-word that a bit: One of the major goals of Rails 4 Queue is to ensure that the sending of e-mails does not adversely impact web response time.
Generally, this sort of thing is done using a background jobs system like Resque: you make a job that sends your e-mail. But Rails core thinks we can do better than that, we don't need a background job system if we can just make our web application server do the work after it's completed sending the response to the client.
Here's some terribly ugly and hacky code to demonstrate my point.
Example using thin: https://gist.github.com/jacobo/5164180
Example using Unicorn: https://gist.github.com/jacobo/5164192
If you run these rack apps and hit them with curl, you'll see that the "e-mail processing" does not interfere with the client receiving a response. But, it does tie up these single-threaded web servers. They won't serve the next request until they are finished with the previous after-request job.
Another approach might be to use threads, but unless you are on JRuby or Rubinius, you would likely slow down your response processing. As your e-mail sending thread will likely start executing and using up processing power that would otherwise be used to generate the response.
The only good way to solve this problem is to make changes to Rack itself, but I've yet to see a proposal on exactly what these might be.
I'm hoping to see the discussion continue. Maybe there's even an opportunity for other community members to step up and propose ideas about what the Rails 4 Queue API should look like. I think getting this right and shipping it will be a huge win for Rails developers everywhere who are currently duplicating effort working on a myriad of background job processing extensions and customizations coupled to their current backend queueing library.