Our latest post is from a special guest and Engine Yard partner Xavier Shay. He'll be running a pair of training sessions on "using your database to make your Ruby on Rails applications rock solid" at Engine Yard's San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.
Bob opens a database transaction and selects everything from the books table. Tom comes along and adds a new book, then Bob, in his same transaction, repeats his same query for all the books. Does Bob see the new book that Tom added?
The answer is that you get to choose! It's important to understand what your choices are (and what choice your preferred database makes for you) so that you can ensure your code executes in a way that you intend.
The SQL standard specifies levels for how "isolated" transactions running at the same time are, all the way from being able to see uncommitted changes (not isolated) to effectively running the transactions in serial (full isolation). Academically there are eight levels of isolation, but for most purposes you only need to worry about the four defined by the standard. MySQL implements all four, PostgreSQL only two. You can specify a global isolation level for your database, but also override it for individual transactions.
The easiest to understand are the extreme levels: no isolation and total isolation. The first of these is known as read uncommitted, and it allows Bob to read the new book that Tom is adding even before Tom has committed his changes. As you can imagine this level is mostly useless, however it can very occasionally be handy in some reporting situations.
At the other end of the spectrum is full isolation, known in the spec as serializable. Bob will never see the new book that Tom is adding until he starts a new transaction. The database Bob sees is consistent---within the one transaction, the same query will always return the same result. At first glance this level seems like a great option but there's a lot of overhead involved, it drastically reduces the amount of concurrency you can achieve, and for most purposes the serializable level is overkill.
There are two isolation levels in between read uncommitted and serializable, they are read committed and read repeatable, and this is where it gets interesting. Read committed is the default isolation level in PostgreSQL and Oracle, and is one step up from read uncommitted. It is the most "common sense" level: Bob will not see any changes made by Tom until Tom commits them.
MySQL defaults to read repeatable. In this level, Bob will not see any updates Tom commits, but will see any inserts. Say in Bob's first select he sees one book titled "The Odessey". Tom then fixes the spelling mistake to "The Odyssey", and also add Homer's other epic poem "The Iliad". When Bob selects all books again, he will see "The Odessey" (old title, no spelling fix) and "The Iliad" (the inserted book).
To summarize, the four levels from least isolated to most isolated are: read uncommitted, read committed, repeatable read, and serializable. They define what types of changes made by Tom that Bob will be able to see within a single transaction.
Say the books we are selecting are ordered based on an arbitrary position column (they're on our bookshelf, for instance). Assume read committed isolation level.
Bob wants to move "The Odyssey" to the bottom position. To do this, he needs to update its position to the bottom of the list (position 4), then subtract 1 from all positions. At the same time, Tom is adding a new book "The Cypria". Working this through:
- Bob checks the bottom position, finds it to be 4
- Tom inserts "The Cypria" in the bottom position of 4
- Bob updates the position of "The Odyssey" to 4
- Bob subtracts 1 from all positions, and since he is using read committed he will "see" and update the newly inserted book.
Both "The Odyssey" and "The Cypria" have a position of 3
Title | Position
The Iliad | 1 The Nostoi | 2 The Odyssey | 3 The Cypria | 3
If Bob had used the serializable level, the list would have remained consistent for his entire transaction, so his update would not have affected "The Cypria" that Tom inserted, and so would not have updated its position from 4 to 3. (In practice the way databases normally handle this is to actually abort one of the transactions with an error.)
For those using Rails, you may have recognized the above scenario as a typical @acts_as_list@ scenario, and you'd be correct. In a default configuration, the @acts_as_list@ plugin makes the same mistake outlined above, and will leave you with inconsistent data. The quickest fix is to wrap all list operations in a serializable transaction.
(It may have occurred to you that some locking or a unique index on position could avoid the exact scenario above, but that breaks @acts_as_list@ and fails to address some other edge cases left as an exercise for the reader. The main point for the purpose of this article is to understand why it breaks under read committed, but works under serializable.)
As a general rule, read committed is a sensible default. It's easy to reason about, fast, and forces you to be explicit about your locking strategy. Jump up to serializable when needed, usually when dealing with ranges. MySQL's repeatable read default can be confusing and deadlock in unintuitive ways, as such it is not recommended.
This has been a very brief introduction to the four standard SQL isolation levels: read uncommitted, read committed, repeatable read, and serializable. Hopefully it has helped you get your head around them. I'll be going into much more detail with practical hands on exercises in my training days at Engine Yard's San Francisco office on the 24th and 31st of July. Visit www.dbisyourfriend.com for course and registration details.