There's many things we end up needing to perform background jobs for; but the main reason is to provide a snappy, non-blocking user experience.
Whether that task is encoding a video file, batch data import, or (in one case I ran into) jabber instant messaging, we want to offload them from our web servers as quickly as possible.
There are lots of tools to accomplish this across all languages, including Resque, Sidekiq, delayed_job, node-schedule, beanstalkd, Amazon Simple Queue Service (SQS) and then there is my personal favorite: Gearman.
Gearman has client libraries in C, PHP, Ruby, Node.js, Python, Java, Perl, C#/.NET and even includes tools that can be called via shell script, and user-defined functions for both MySQL and PostgreSQL.
Gearman itself is written in C, and is super simple. If you get a chance, I highly recommend checking out the source code. Note: gearman was originally written in Perl and later re-written in C. Be sure not to use the perl version (e.g. dev-perl/Gearman* in Gentoo portage).
The main reason I like gearmand is it's simplicity. Gearman has three parts to it:
- GearmanClient submits tasks to the job queue
- gearmand is the job queue itself (running as a daemon)
- GearmanWorker retrieves the tasks from the job queue and handles them
By default, the Gearman queue is stored in memory, however you can also make it persistent and stored in MySQL, PostgreSQL, memcached or SQLite. With memcache, obviously if it's on the same machine as
gearmand then you're likely to lose it just as easily as the regular queue. The only difference is that you could re-start
gearmand without losing the queue.
However, another potential option is to use the new MySQL 5.6 NoSQL Interface, which supports the memcached protocol. This should be faster than using the Gearman MySQL backend without sacrificing the persistence it brings.
It obviously has the ability to run background jobs being as this is what this post is all about, but it also foreground jobs which allow the GearmanClient and the GearmanWorker to communicate with each other using gearmand as the middle-man.
The best thing about Gearman, is that you can use different languages for different pieces. So you build your website in PHP, but maybe it's not the best option for wrangling text; so you schedule a job with gearmand, and a Python worker picks it up. Or Ruby, or Node.js, or… you get the idea.
What this allows us to do is to pick the correct tool for every task in our stack. Why workaround the pitfalls of our primary language when you can simply pick up a better tool and do things right.
First we are going to use PHP to schedule a task with the job queue. This uses the pecl/gearman extension.
In this simple example we create an instance of the
\GearmanClient class, tell it to connect to the default server (
localhost:4730) and send a background task (
Next we ensure that the task was added successfully, and return the job handle.
We might call it with something like this, passing in the username:
We would then want to store the handle so that we can later check the status of the task.
Next we'll create a worker, this time in Ruby:
Here we use the
gearman-ruby gem to create a
Gearman::Worker, and then register the task handler.
In this case, we first decode the JSON data passed in from our
GearmanClient and then find our user in the database by the username. We then call the
For something that takes more time, you could send back a running status. The
job variable is an instance of Gearman::Worker::Job class which allows you to respond using
It's important to note that you can run as many workers for each task as you'd like, Gearman will not hand the same job to multiple workers (however, there is a re-try config option should it fail) and because they are pulling jobs it will not overload the workers, though you may run out of them. The number of workers you run can also act as a way to manage priority — higher priority jobs get more workers — and balance resources.
Checking the Status
This creates an HTTP server on port
8000 that when passed a handle via
GET arguments will return the status.
Using Gearman with Engine Yard Cloud
In order to make Gearman a part of the background job processes on your Engine Yard Cloud account, it is necessary to create a custom chef recipe to compile it yourself (chef recipes can be used to take advantage of software outside of the current stack). For more details on using Chef with Engine Yard Cloud, check out our knowledge base.
As with all background jobs, best practices recommend Gearman be run on an Utility Instance, so that all issues are processed without interfering with the Application Instances themselves.
Can't we all just get along?
So, as you can see, Gearman can act like glue between the various parts of your application. It's super fast, has low resource usage and can be used with almost any language you can think of.
Additionally, it can not only do foreground tasks (with communication), but can also prioritize jobs into high/standard/low priority queues.
You can also easily scale Gearman as the clients and workers both support multiple servers, allowing you to spread your queue, and your workers out over multiple machines.
I highly recommend checking it out at http://gearman.org.