Hello all, The Engine Yard blog is back in action after taking a break following JRuby 1.5, Rubinius 1.0, the introduction of xCloud, RailsConf and (very soon) Rails 3. Our latest post is from a special guest and Engine Yard partner Xavier Shay. He'll be running a pair of training sessions on ‘using your database to make your Ruby on Rails applications rock solid' at Engine Yard's San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.
Your Ruby on Rails code is run concurrently, whether you like it or not.
Concurrency is a staple term when talking about hosting infrastructure, but it is too often brushed aside when discussing actual code bases. This attitude is especially prevalent in the Ruby on Rails community: I can't name one popular plugin that gets it right. In this post I will address problems with the typical state machine pattern used by Rails applications, and show you how to address them and make your code bullet-proof.
Consider the following controller action, backing a big green "ship button" next to a purchase order:
Imagine two users both press the "ship" button at the same time. (Or as often happen, one user double clicks the button.) The two requests will hit the load balancer and be distributed out to run on different processes. What happens when the above code---typical of many rails applications---is run in two different places at the same time?
Both processes will load the order from the database at line 2. At line 3 when the
ship! method is run, both processes will check the attributes of the order and see that it is currently unshipped. As a result, both execute shipping code, which may include sending emails, updating caches, and transferring funds. As a result, the customer will receive duplicate emails, or worse, be charged twice. All versions of acts_as_state_machine (AASM) exhibit this behavior.
Any time you read data from the database with the intention of making changes based on that data ("ship the order if it isn't already shipped") you must obtain an exclusive database lock on the row. The database will block any processes trying to access that row until the session that obtained the lock concludes its transaction (COMMIT or ROLLBACK). ActiveRecord allows us to do this using the
Working through the above example again, the first process to execute the
find will issue the following SQL:
Notice the "FOR UPDATE" on the end; this instructs the database to place an exclusive lock on the row. When the second process executes the
find and submits the above SQL to the database, the database will wait for the first transaction to complete (after calling
ship! and updating the state of the order) before reading and returning the row. The returned row will now have a state of "shipped", and as such the
ship! method will effectively be a noop (no operation). The customer will only receive one email.
It is also possible using ActiveRecord to lock an object that has been already loaded from the database:
This is equivalent to a
reload, but adds the "FOR UPDATE" suffix necessary for a database lock. It is an extra SQL statement (the order is selected twice), but is an easier pattern to abstract away.
alias_method_chain, we can continue to use exactly the same controller code we started with (just a plain call to
ship!), and locking is handled for us in the background.
Lost updates or duplicate execution won't be a problem for every website, but if you are starting to worry about the concurrency of your hosting infrastructure, it's worth having a look over your code too.
If you'd like to join me for some hands-on work with this, I'll be running two classes at Engine Yard's San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.