Engine Yard Blog RSS Feed

Rubinius 1.2 was just released and is available at http://rubini.us. There are a number of new features and improvements since 1.0.

LLVM 2.8

We've upgraded to using LLVM version 2.8, the latest released version. LLVM powers the high performance compiler Rubinius uses to compile Ruby code all the way down to machine code. This brings some minor performance improvements related to better optimizations, but this mostly paves the way for future high-level optimizations that we'll be implementing with new LLVM 2.8 features.

Bytecode Verifier

More and more people are beginning to use Rubinius as a platform for other languages, not just Ruby. This is a development we're all super excited about, but it meant we needed to get a little more serious about making sure that bytecode can't crash the VM. Previously, we got along fine with no verifier simply because there was only one piece of code that generated Rubinius bytecode, the Rubinius compiler. Now with others also doing so, we needed to make sure they're generating valid bytecode. The bytecode verifier is a VM operation that is performed lazily, when a method is first invoked. It makes sure that the bytecode is consistent, for instance, only using the amount of stack it has requested and only using the proper number of local variables. With this new safety net in place, people can feel much more confident about generating their own bytecode without causing any hard crashes in the system.

Memory Efficiency

Ruby is being used for larger and larger software projects these days. This makes how system memory efficiency very important. There are two measures of memory efficiency: growth stability and memory usage per object. Growth stability is largely a feature of the garbage collector, and is something that Rubinius has done quite well for some time now and so we focused on improving the memory usage per object.

Specifically, how an object stores its instance variables in memory. Because Ruby does not require instance variable declaration, the simplest way to model instance variables is with a hash. This is precisely what Rubinius used to do. The issue is for classes that have a small number of instance variables. In this case, the size of the hash table is substantial, needing more than 100 bytes of memory just to store one word (either 4 or 8 bytes)! And so we set about to try and reduce this overhead. Because Rubinius uses so many Ruby classes internally, we knew that a fix would have immediate benefits.

The new code is based upon an easily observable assumption about a class, namely that it defines the vast majority (usually all) of its methods before an instance of the class is created. We exploit this by running some code the first time an instance of a class is created which looks at all methods available to instances of this class. This means all methods defined in the class itself, its superclasses, and any mixed in modules. From the methods, we build a table of all instance variables those methods use.

Now we can construct a very good picture of how memory should be laid out for instances of this class, allowing us to store the instance variables in memory without needing a hash table. The memory usage typically goes from 100 bytes to 8 bytes on a 64bit machine. Quite a savings!

Debugger

A good debugger is invaluable when working on code of any kind. One of the big additions since 1.0 is the built-in debugger. Gems such as ruby-debug wouldn't compile, let alone work, because Rubinius doesn't share any internals with MRI.

We decided to take a different approach than most debuggers for languages. Typically, the debugger is delivered and used via some kind of command line interface only. We wanted a command line interface, but we didn't want it to be the only way into the system.

So, instead we built a Debugging API into the VM itself and built the CLI debugger on top of this API. This means it's available to be used by other projects that want to build new and innovative debuggers. In fact, the CLI interface we ship should be considered a kind of reference implementation. It's a bit short on features, but shows easily how to use the API and build upon it. We've already had people begin to port their debugger logic over to using the Rubinius API so that existing debuggers can be plugged into Rubinius simply.

Using the debugger is easy:


require 'rubinius/debugger'
Rubinius::Debugger.here

This will drop the code into the debugger at the .here method call, allowing you to inspect the call stack and objects on it. You can also use the -Xdebug option to rbx, which will start the debugger before loading the initial program, allowing you to set breakpoints before loading code.

For 1.2, we've introduced a special ruby-debug shim gem. This gem doesn't contain ruby-debug, but instead emulates the most common entrance point to it and invokes the built-in debugger. This means that projects such as Rails which have ruby-debug support integrated in work out of the box.

In future releases, we'll continue to improve on the debugging APIs as well as the CLI interface. So if you've got ideas for improvement, be sure to let us know!

Also we're looking for a list of projects that begin to add support for the debugging API. This includes frameworks like Rails and editors such as Emacs, VIM, Textmate, etc. If you're interested in adding support for your favorite project, let us know so we can help!

Query Agent

Query Agent (QA) is yet another tool that developers can use to debug and introspect their running programs. It provides the ability for the VM to export all kinds of low level data such as statistics about the garbage collectors. In addition to raw stats, it provides the ability to trigger functionality by reading and writing values.

For example, to get a live backtrace of all threads, simply read the system.backtrace variable. The values returned are calculated on request and thus reflects the current state of the system.

Implementation wise, Query Agent is a socket based API that is implemented directly by the VM. We opted to use BERT as the wire protocol, which allowed us to easily write a ruby client using the existing BERT encoder/decoder gem.

We hope that people will begin incorporating Query Agent support into their monitoring tools, allowing them to get very rich data and control of their ruby processes.

Heap Dump

Lastly, we have integrated a memory debugging tool directly into the VM. Heap Dump provides the ability to write out the entire object graph to disk in a stable, portable format. That file can then be read back in and analyzed. A very common analysis that is performed is simply to find out how many objects of each class exist in the system. This knowledge alone can help developers figure out object leaks that might exist in their code. There are currently two interfaces to Heap Dump, one in Ruby and one via the Query Agent.

In Ruby:


Rubinius::VM.dump_heap("/path/to/file")

and via the Query Agent:

set system.memory.dump /path/to/file

By having access via Query Agent, it become possible to debugging production processes offline.

We've only begun writing tools to analyze the dumps, which is available at https://github.com/evanphx/heap_dump.

Beyond 1.2

The team has been doing great since the 1.0 release, expanding Rubinius compatibility and improving performance. Coming up in the next few months, we've got three key features we're really excited about: 1.9 support, Microsoft Windows support, and true concurrency. These are big ticket items that we've been asked about a lot, and that will push Rubinius into more and more developers hands.

I'd like to thank everyone for all the support this year. It's been a wonderful year full of great releases. Seeing Rubinius continue to grow and blossom in 2011 should be even better!


Tagged:

comments powered by Disqus