Caching is extremely useful to implement for web applications. While it can be a good idea for the majority of web applications to utilize caching, there are times where caching is unnecessary and can be a time sink for developers. When is it a good idea to use caching? When an application is getting a lot of requests and New Relic detects a strain on your instances, it's probably time to look into caching. There are a few different types of caching and some good resources to help decide which type is best for your application. We will look at memory caches using Memcached or Redis, and HTTP caches such as Varnish and Rack::Cache. If you are using Rails you can easily use its built-in caching. Check out the Rails Guide Caching with Rails for an overview.
Redis is a key-value store. With Redis, your data is held in memory and will be persisted to disk if necessary. This allows it to be useful for caching purposes. Redis is used by companies such as GitHub, craigslist, and here at Engine Yard.
Image courtesy of Redis 101 from Peter Cooper
Let us look at how Redis can be useful for a web application.Most web application requests return a variety of different lists such as posts, comments, followers, etc. The majority of key-value stores store these lists in single units (or a "blob"). As a result, most typical list related operations, such as adding an element, are inefficient. Fortunately, Redis has native list support which allows it to perform operations on lists very efficiently.
Counter caching in Rails allows you to accelerate performance by reducing the number of SQL queries and preventing unnecessary instantiation of objects, but the Rails implementation using generic SQL features does not scale well. Engine Yard customer MUBI uses Redis to replace Rails default counter caching with speedy Redis counters. Redis allows counter caches to be implemented with extremely fast, non-blocking atomic operations.
Let's explore the benefits of the counter caching using the example that Post
has_many :comments and Comment
This results in the following three SQL queries:
Caching allows us to instead use a single query by adding the following relationship in comment.rb:
belongs_to :post, :counter_cache => true
We also have to update the Post table with a new attribute:
add_column :posts, :comments_count :integer
Now there is just one trip to the database to fetch the comment count:
User Load (0.4ms) SELECT * FROMposts
ORDER BY posts.id DESC LIMIT 1
Why stop there? We can also utilize Redis for counter caching. When we do a count query in SQL, we can write the result to a Redis key. For example:
SELECT count(*) AS count_all FROMcomments
WHERE post_id = 1
is also written to Redis:
When pulling post records from the database we can test for the existence of relevant Redis keys for whatever counts we need. If they exist then we're done. If they do not exist, then they are looked up in MySQL and pushed into Redis, ready for next time. Also, since we are utilizing the redis incr command we avoid having to do a SQL query, except to initialize it, and since it is an atomic operation we can guarantee that the count always represents the exact number of times it was called without any race conditions.
Now some people have called Redis Memcached on steroids. However, it does not mean Memcached should be disregarded. There have been a lot of benchmarks done between Redis and Memcached and a lot of debate about the accuracy of those benchmarks. What it really comes down to is what you believe is best suited for your application.
One difference to consider is Memcached does Least-Recently-Used (LRU) eviction of values from the cache. With Redis you only evict data when it's explicitly removed or expired, and it will store as much data as you put into it. Now in Redis 2.2 you can configure Redis using the maxmemory flag instead of setting expires so you can get LRU cache, but it is an option that you have to enable and is not the default. Memcached is being used by companies like Twitter, Reddit and Zynga. A final thing to consider is that Memcached is also integrated into Rails since Rails 2.1, making it even easier to use.
Memcached keeps the values in RAM so it's a transitory cache. Keep in mind that it discards the oldest values, so you cannot assume that data stored in Memcached will still be there when you need it. As stated earlier, it's very important to make sure it's right for your application because Memcached is slower than SELECT on localhost for small sites *; you should ensure you can keep up with the requests or it won't help you to use it.
Image courtesy of Redis 101 from Peter Cooper
A good use for Memcached is doing action caching. Action caching is a lot like page caching, but the flow is slightly different. With action caching the incoming web requests goes from the webserver to the Rails stack. One issue with page caching that the Rails guides goes through is that you cannot use it for pages that require to restrict access somehow.
Example: If you want to only allow authenticated users to edit or create a Product object, but still cache those pages:
Also do not forget to setup your configuration for Memcached. There is a good amount of information from an older post that discusses when Rails 2.1 got better integrated caching.
HTTP caching is another form of caching you can utilize. If you are not familiar with HTTP caching, this blog post offers a nice overview. Two useful projects worth checking out are Varnish and Rack::Cache. Now you might be asking "Why do people use them?" One reason is to reduce latency. In regards to latency, the request is satisfied from the cache, which takes less time, making the Web seem more responsive. For example, if you have dynamically generated content, using an HTTP cache like Varnish will result in better performance than using Memcached. This is because when using an HTTP cache your application server is not accessed for cache hits.
Varnish provides you with a default setup, which can be found in default.vcl that will work for most applications. However, you have the ability to really go in and customize it, which is recommended since Varnish assumes things that might not be correct about your application. The only work you have to do is ensuring your resources have appropriate HTTP caching parameters (Expires/max-age and ETag/Last-Modified). Do not forget to normalize the hostname to avoid caching the same resource multiple times. Some other things to remember is that Varnish was meant to run on 64-bit machines if you try it on 32-bit it will work, but you will definitely be running into some issues. Other recommendations I would make are keep your VCL simple and tune when you really need to using Varnish tips and best practices that others have found useful.
Another useful way to utilize HTTP caching is with Ryan Tomayko's Rack::Cache. A key aspect of Rack::Cache is the middleware piece that sits in the front of each backend process and does not require any infrastructure investment of a separate daemon process like Varnish.
Varnish has plenty of great examples and real world use cases on their site that can help you truly understand the usefulness it provides. Check them out to get a feel for how you can utilize it for your application. Also, if you want to test out Varnish on AppCloud take a look at the Chef recipe we have for it.
Now, it's up to you to decide whether all or some of these types of caching will be beneficial for you. Make sure to utilize them to the fullest potential to ensure you and your customers have the most enjoyable experience possible. If you have any caching experiences or gotchas, please share them in the comments.
Redis 101 presentation A Collection of Redis Use Cases To Redis or Not To Redis? Memcached Basics for Rails Caching, Memcached and Rails Scaling Rails with Memcached Things Caches Do Caching Tutorial You're Doing It Wrong