LDAP Directories: The Forgotten NoSQL

When most Rails developers encounter LDAP, it's usually for user authentication. And most of the time, there's no choice, they're working under a dictate that requires them to use it. Usually, this means Active Directory, but very occasionally something like OpenLDAP or the Sun Java Systems Directory Server.

It's hard to imagine now, but there was once great excitement about the potential for LDAP based directory servers to become more than just authentication servers and morph into general purpose datastores. LDAP directories promised a single, scalable, high performance data store that could be queried for common information across multiple applications. After all, directories had a lot of virtues:

  • Fast Queries: LDAP directories were heavily indexed, so query speeds were truly impressive—reliably 10x what a relational database could manage. (Write speed was much slower for the same reason: lots of indexes to update when a write happened)
  • Replication: LDAP directories were an "eventually consistent" data store long before Dynamo or Cassandra. Multi-master replication allowed a distributed network of directories to accept writes at any node, and then relay these updates around the directory network. The last update in time always won.
  • __Partionable: __directories were giant tree structures, and branches could be picked up and moved to another server if the directory got too big. There was built-in referential linking from each amputation point to the correct server, and these servers could be easily geographically distributed.
  • __Standardized and efficient: __coming from a telecom heritage, LDAP was an efficient wire protocol. It was globalized and cross-system. LDAP queries and responses were binary encoded using distinguished encoding rules, using ASN.1 as the data representation syntax.

In addition to these benefits, directories like Netscape Directory Server and Microsoft Active Directory had a seemingly endless list of other features like rich, complex configurable access control rules and permissions; multiple ways to define groups; rich query semantics and more.

And yet, when we look around today, it's not LDAP directories that have the NoSQL buzz; it's the far looser and simpler key-value stores like Cassandra, MongoDB and Redis. So where did LDAP fall down, and is there anything to be learned from its (relative) failure? Here is my own take on why LDAP didn't take over the world, colored by my (brief) tenure as a product manager for Netscape Directory Server.

  1. Telecom protocols FTL: LDAP, in my own humble opinion, was fatally crippled by its telecom parentage. Just reading the first page of the ASN.1 data structure specification could make your eyes bleed. Debugging a badly behaved LDAP client or query was basically a job for experts wielding binary to text crackers. There was a separate format—LDIF—for converting LDAP into human-readable code, but this was a friction point. Compared to ASN.1, JSON (as an example) is severely limited and incomplete, and yet... about 1000x more popular as a result.
  2. __Access control that exceeded human brain capacity: __LDAP directories provided lots of rope for people who cared about security to firmly and irrevocably tie themselves in knots. Time and again, I'd see customers with five or more layers of access control rules they found to be confounding, with counter-intuitive effects. Better yet, this level of complexity was indecipherable by anyone without drawing five dimensional set diagrams. Sometimes, there are features you shouldn't put into a product no matter how much people ask you. They know not what they do.
  3. Interesting data wanted to be relational: it was a simple, but sad truth. Data that's interesting and important enough to be accessed often by your applications, seems to want to be compared and operated on in the context of your other interesting data; that sounds a lot like the right case for a relational database. Directories, as a hierarchical data store, couldn't easily accommodate the kinds of queries that customers ended up wanting to do, once they were storing enough interesting data. So the solution was to patch in "relationy" features like aliases which soft-linked two values in different parts of the tree—but these were patchwork solutions. In their worst (over-used) incarnation, they turned a directory server into a weird hard-to-maintain mutant hybrid of relational and hierarchical database.

There were other downsides to LDAP directories of course. The learning curve could be steep for LDAP since it was a truly novel technology for most people used to RDBMS's and SQL. And probably most importantly, most directories weren't open source, and so they missed the opportunity to fully leverage a community of interested developers and administrators.

Lessons for this Generation of NoSQL (?)

I hesitate to speculate on the lessons from LDAP for this generation of NoSQL stores, since open source has changed the game considerably in the last ten years. That said, I do think LDAP got a lot of things right (fast, distributable, scalable and standardized). It's arguable whether custom binary protocols (aka MongoDB's) will really hurt adoption as long as the data structure specifications are reasonably readable, but Couch's JSON/REST/HTTP combo is certainly a little easier on the eyes.

I do know one thing: keep the access control simple. Your users will thank you later!