| |

Subscribe / Log in / New account

Alternatives to SQL Databases

April 14, 2009

This article was contributed by Ian Ward

Traditional SQL databases with "ACID" properties (Atomicity, Consistency, Isolation and Durability) give strong guarantees about what happens when data is stored and retrieved. These guarantees make it easier for application developers, freeing them from thinking about exactly how the data is stored and indexed, or even which database is running. However, these guarantees come with a cost.

Bob Ippolito presented a talk titled "Drop ACID and think about data" at PyCon 2009, which gave an overview of number of non-traditional databases. These alternatives compromise one or more of the ACID properties and expose the particulars of that data store's implementation in exchange for improved performance or scalability. Each also has its own limitations. This article will look at the more mature open source options Ippolito mentioned.

A number of companies have developed their own in-house data stores, including Amazon's Dynamo and Google's Bigtable. While none of the open source options are exactly like Dynamo or Bigtable, there are a number of high-performance, reliable and scalable options available.

Alternative Database Language Support
* means language support is pending
Cassandra:	C++, C#, Java, Perl, Python, PHP, Erlang, Ruby
Memcached:	C/C++, C#, Java, Perl, Python, PHP, Ruby, Lua, OCaml, Common LISP
Tokyo Cabinet:	C/C++, Java, Perl, Ruby, Lua
Redis:	C/C++, Java*, Perl, Python, PHP, Erlang, Ruby, Lua, Tcl
CouchDB:	C#, Java, Perl, Python, PHP, Erlang, Ruby, Haskell, JavaScript, Common LISP
MongoDB:	C++, Java, Python, PHP, Erlang*, Ruby

Cassandra

Cassandra is a data store written in Java that was open-sourced by Facebook and is now part of the Apache Incubator. Cassandra was originally designed to solve Facebook's in-box searching problem. Email reverse indexes were growing much faster than their databases could keep up with and they needed a affordable way to continue to grow.

Cassandra is designed to scale inexpensively with commodity networking equipment and servers, possibly in multiple data centers. Scalability and high availability are achieved by automatically "sharding" and replicating data across the servers and data centers.

A single Cassandra instance stores a single table, and each row is accessed with a key string. Every row of this table can have its own structure, storing a huge number of (key, value, time-stamp) tuples and/or nested columns. This makes Cassandra much more flexible than a simple key-value store, but not as general as a document database.

Although in heavy use by Facebook, Cassandra is early in development and still lacks some polish and documentation.

Memcached

Perhaps the simplest key-value store is Memcached. Memcached is widely used to to speed up web applications by caching dynamic content. Part or all of the web pages are served from the cache instead of generating them at each request. Unlike in-process or shared memory caches, Memcached listens on a network socket and can be shared by many servers. Memcached may also be run on multiple servers and it will spread the keys across those servers and transparently fall back to servers that are still available when one goes down.

Memcached keys and values are always strings. In addition to storing, retrieving and deleting values, it allows atomic appending/prepending string data to stored values and addition to/subtraction from 64-bit integer values stored as decimal strings.

Memcached's data store is a fixed size and resides entirely in-memory. Data may be stored with an expiration time. Memcached will actively throw out data when the cache is full or when the data is set to expire.

Tokyo Cabinet

For a key-value data store that won't throw out data, Tokyo Cabinet is a good choice. Like Berkeley DB, it uses either a hash table, B+ tree or a array of fixed-length records to store data on disk, but Tokyo Cabinet performs better and is thread safe. Tokyo Cabinet also promises to never corrupt data even in a "catastrophic situation". Tokyo Cabinet is actively maintained, and data stored is not limited by system RAM.

Tokyo Cabinet clients are separated into readers and writers. When a writer is accessing the database all other clients are blocked. This will result in poor performance for write-heavy workloads.

Tokyo Cabinet supports appending data to values stored. When using a B+ tree layout Tokyo Cabinet provides a cursor object to efficiently move forward and backward through the keys. B+ tree pages may also be compressed on disk with zlib or bzip2. Compressing data not only saves disk space, but can also increase performance on I/O-bound systems.

Redis

Redis is a disk-backed, in-memory key-value store with a number of additional features. Redis supports master-slave replication for redundancy, but not sharding, so all data must fit in a single system's RAM. Redis values may be binary strings, lists or sets. Redis provides atomic addition to/subtraction from integer values stored as decimal strings and push/pop/replacement of values in lists. The intersection of set values stored may also be calculated.

Redis can asynchronously save the database on request by forking the server process and writing out data in the background. The last successful save time may be queried to check when the changes have made it to disk. This design allows for good performance with the ability to save data when it makes sense for the particular application, but the application is responsible to make sure data is properly saved.

When using any key-value store as a cache care must be taken to invalidate values when the data changes or inconsistency will be introduced. Choosing a memory-only cache will be faster once it is populated, but there is a cost associated with filling an empty cache on restart. Key-Value stores are ideal for storing data that is not deeply nested and does not require complicated locking.

CouchDB

Document databases are designed to store large blocks of semi-structured data. The data is not restricted to a particular schema, so new versions of data can be stored alongside old versions without the need for migrations. Documents can be very large and deeply nested.

CouchDB is a JSON-based document database written in Erlang. CouchDB gives access to the database over HTTP with a RESTful API. Views of the database may be created on demand using Javascript to collect and filter document contents and are updated as documents change. Indexes are not maintained outside of views, so there is a start-up cost associated with constructing a new view.

CouchDB documents are stored with a sequence number and are never overwritten, this way partial writes will never result in data corruption. Readers are never blocked by writers and will always see a consistent snapshot of the database while reading data. The data is periodically compacted by writing out a new data file and deleting the original once it is no longer being accessed.

CouchDB uses peer based asynchronous replication. Documents may be updated on any peer, allowing for good write throughput. Conflicts will occur when two clients update the same document, and multiple conflicting documents may coexist in the database. A deterministic method is used to decide which document will be treated as the latest version. This lets CouchDB leave conflict resolution to the application. Once a conflict is resolved the new version is stored in the database as usual.

MongoDB

MongoDB is a document database written in C++. MongoDB uses a binary-encoded JSON format that shrinks the data size and allows for faster searching and indexing. Large binary data, such as video files, can also be stored more efficiently in this format. Data is updated in place and MongoDB will automatically run a repair procedure on the database in the event of an unclean shutdown.

MongoDB documents may be nested or include references to other documents. References will be replaced with the value of the referenced document when queried. MongoDB supports persistent single or compound key indexes. Indexes are implemented as B-Trees and queries will automatically take advantage of all indexes available. Queries may include common conditional operators, membership testing and values in embedded documents.

MongoDB has auto-sharding support, splitting documents across many servers so that data stored is not limited by the capacity of a single server. MongoDB also supports asynchronous replication for high availability.

Choosing a Data Store

The best data store for an application depends in large part on how deeply nested the values stored will be. If the application needs to only store strings and integers then a simple key-value store like Memcached, Tokyo Cabinet or Redis would be best. If the values can be represented as lists and sets of simple values then Tokyo Cabinet, Redis or Cassandra would be good options. If the application needs nested lists and hashes then choose Cassandra, CouchDB or MongoDB. Finally, if the values contain deeply nested data then only a document database like CouchDB or MongoDB will do.

Once a data store has been chosen and and the application optimized for it, switching to a completely different API will not be easy. It is worth investing time evaluating the remaining options by writing code to simulate the application's usage patterns before making a choice.

Index entries for this article
GuestArticles	Ward, Ian

Alternatives to SQL Databases

Posted Apr 16, 2009 4:16 UTC (Thu) by jordanb (guest, #45668) [Link] (4 responses)

One thing to consider when looking at storing persistent data is how much effort the data store developers have put into maintaining data integrity. This was driven home to me when I was administering a computer running OpenLDAP and Fedora. The computer was in a situation where it would occasionally be hard-booted, and that combined with some (apparent) version incompatibilities with the Fedora repos would cause the sleepycat backend for the LDAP server to corrupt itself.

Our eventual (horrible, rube-goldbergian) workaround was to store the actual information in MySQL/innodb and have a script nuke the LDAP database and re-inject the data whenever it fell over.

I decided after that experience that if the data has even a chance of being important, then the most important property of any datastore is that the information should always be there and never be corrupt, regardless of if the computer is hard-booted or if the disk drive lies or if Ted Tso decides that his ideology is more important than your data.

A bit later I heard an interview with Richard Hipp in which he discussed how much effort they put into making sure data in sqlite is "in the oxide" -- going so far as simulating hard boots during writes in their testing procedures. I've since made sqlite my default data store whenever the data might be important and it's not in an RDBMS and would hesitate to go to something else without some assurance that they take a similar amount of interest in data integrity.

That said, memcached is awesome for caching, and some of these things do sound interesting for storing unimportant data like search indicies.

Alternatives to SQL Databases

Posted Apr 16, 2009 17:00 UTC (Thu) by intgr (subscriber, #39733) [Link] (2 responses)

I think the problem you were having with Sleepycat BDB wans't "corruption" per se. BDB requires an explicit recovery command after an unclean shutdown and many applications using it do not automatically run this.

This annoying feature has probably driven off many of Sleepycat's potential customers.

Alternatives to SQL Databases

Posted Apr 16, 2009 19:43 UTC (Thu) by jordanb (guest, #45668) [Link] (1 responses)

Yeah in this case we tried making the LDAP init script do that, but it seemed to make the problem even worse. We began to suspect that the problem was that there was a version incompatibility between the bdb tools we had and the library being used, which was likely Fedora's fault, but we weren't able to verify that.

But anyway, 'need to explicitly run recovery before attempting to use the database again' is a design failure, imho. The database should be able to recognize that it is not completely consistent and recover itself on startup, and there should always be enough data to reach a consistent state that's not too far from the state when the system crashed.

Alternatives to SQL Databases

Posted Apr 23, 2009 14:44 UTC (Thu) by joib (subscriber, #8541) [Link]

At least on RHEL, you need to use slapd_db_recover rather than dbxx_recover, and similar for the other db_* commands, this will use the correct version of the BDB libraries that openldap was built against.

But yeah, needing manual recovery after a crash is incredibly annoying. Though we have replicated openldap servers that reduce the likelihood of service disruptions due to this.

Alternatives to SQL Databases

Posted May 1, 2009 18:00 UTC (Fri) by gdamjan (subscriber, #33634) [Link]

Well, at least CouchDB is very robust.

CouchDB will only append to the storage file on disk, and will add a token at the end of the append. So in the wors case you only loose the last "transaction".

CouchDB does not need any special recovery tools, and instatly restarts

Alternatives to SQL Databases

Posted Apr 16, 2009 4:37 UTC (Thu) by tstover (guest, #56283) [Link]

Tokyo Cabinet also has a server component called Tokyo tyrant that not only serializes access for multiple process, but also supports the memcached protocol for a persistent memcached solution. It's also the successor to the noteworthy qdbm. It has much to offer those exploring data storage black magic.

Alternatives to SQL Databases

Posted Apr 16, 2009 8:54 UTC (Thu) by djc (subscriber, #56880) [Link]

Hmm, I've heard that Cassandra is still early stages, with a lot of big code drops from Facebook and not much development in the open.

Might also be interesting to note that both Cassandra and CouchDB are now Apache projects (the former is still in the incubator; the latter recently graduated to top-level).

Alternatives to SQL Databases

Posted Apr 16, 2009 8:58 UTC (Thu) by viiru (subscriber, #53129) [Link]

Perhaps Memcachedb should've been mentioned here, also. It uses BerkeleyDB for storage and the memcached protocol for communication. It's available at http://memcachedb.org/ and also packaged for Debian (by yours truly..)

Stonebreaker et al research supports traditional

Posted Apr 16, 2009 14:12 UTC (Thu) by mrjk (subscriber, #48482) [Link]

There has been a recent study by Michael Stonebreaker and company comparing a "traditional"
parallel SQL database with Map-Reduce over a 100 server farm and finding the SQL version was
much more efficient. Now M.S. is not exactly unbiased here, but still it was interesting. You have to
be ACM member right now I think, but look at ACM Transactions shortly I believe.

Alternatives to SQL Databases

Posted Apr 16, 2009 18:04 UTC (Thu) by rfunk (subscriber, #4054) [Link] (5 responses)

While none of these is relational, in most cases I don't see where they fail
ACID. Can anyone point to the specific non-ACIDity of each of these?

Alternatives to SQL Databases

Posted Apr 19, 2009 17:00 UTC (Sun) by dmag (guest, #17775) [Link] (4 responses)

http://en.wikipedia.org/wiki/ACID

You're running a bank and want to debit $10 from one account and credit $10 to another account. You want it all to happen or none to happen. (Atomicity)

You don't want your "end-of-month" summary report to be off by $10 when that money just happened to be "in transit" when the report was run. Nobody should see the in-between "bad" states. (Consistency, Isolation)

Even more important, you don't want a server failure (crash, power off) to *ever* leave things in that intermediate state permanently. (Durability)

ACID databases can't do much in parallel because it must always think about the strict ordering of transactions.

On the other hand, if you're running a web forum, maybe you're willing to live with the possibility of loosing a few messages (on server failure) or allowing new posts in a deleted forum (for a few seconds) in exchange for scaling 100x better.

(I predict that non-relational/non-ACID will become the dominant form of databases -- because very few things actually need all properties of ACID.)

Alternatives to SQL Databases

Posted Apr 20, 2009 18:22 UTC (Mon) by rfunk (subscriber, #4054) [Link] (3 responses)

I may not have been clear. I'm not looking for a general explanation of ACID; I know what it is. I'm looking for specifically how each of these doesn't fit ACID.

Alternatives to SQL Databases

Posted Apr 22, 2009 2:31 UTC (Wed) by dlang (guest, #313) [Link] (2 responses)

I agree that this is a huge gap in this article.

after the first two paragraphs it ignores the issue it raises to become just a list of random datastores

to make a reasonable decision we need to know what the trade-offs are of each option

for example

memcached fails ACID because it stores everything in ram, so it looses the D (durability)

note that many 'regular' databases can also be configured to sacrafice durability in the name of performance.

the real common thread between these datastores is less the fact that they sacrafice ACID than in the fact that they ignore SQL.

Alternatives to SQL Databases

Posted Apr 22, 2009 12:49 UTC (Wed) by dmag (guest, #17775) [Link] (1 responses)

> the real common thread between these datastores is less the fact that they sacrafice ACID than in the fact that they ignore SQL.

Yes and no. The reason they don't have SQL is that they are young and focused on being different than RDBMSes.

It's actually not that hard to add some SQL support. Amazon's SimpleDB recently added "SQL-like" querying (nothing fancy, just "Select * from Table Where Field=Value"). There are a lot of SQL parsers out there, so it wouldn't be too hard for the others to add a large dose of SQL. Mind you, I don't think any of these will be 100% fully SQL-compliant. But then again, just about every RDBMS ignore some of the dark corners of the SQL standard anyway.

The reason for this new generation is that they scale better on one box, and scale better on multiple boxes. There's a reason that Amazon, Google, Yahoo, etc aren't "powered by Oracle" at their heart.

Each makes completely different assumptions about data. For example, if you are OK with "eventually consistent", you can have better availability during a network partition event.

I think their biggest win will be performance. All of these projects are too young to be fully tuned, but "Real" databases have a lot of overhead logic (query parser, query optimizer, transaction subsystem) that could be tossed out if you want 'bare metal' performance. For example, storing your Order + all its LineItems together means less I/O. Even if you tell your RDBMS to write to memory, I'll bet it's doing all kinds of layout tricks to optimize the "disk".

Alternatives to SQL Databases

Posted Apr 22, 2009 23:03 UTC (Wed) by nix (subscriber, #2304) [Link]

I've seen one site that I know for sure is 'powered by Oracle': Oracle's
own Metalink site. Its performance is utterly appalling: half-minute
delays between doing anything and the response... Oracle's stunning (lack
of) useful full-text search capabilities shine through in the completely
hopeless search page as well. Of course I don't know what systems back
this site but I doubt it's exactly underpowered.

For a long time they had their advertising slogan 'Oracle Software Powers
The Internet' on there. This led to despairing laughter and the
occasional 'thank god it doesn't' from everyone who saw it, including
various Oracle employees.

Oracle is quite good at massive thumping bank systems, but I wouldn't back
a website with it if I were you. Even Oracle can't make that work.

CDB

Posted Apr 16, 2009 22:43 UTC (Thu) by phiggins (subscriber, #5605) [Link]

I just heard about CDB a couple of months ago and became enamoured with it's simplicity. For applications where updates are infrequent and the size of the database is relatively small (such that the entire database can be rewritten whenever and update is made), then CDB probably cannot be beat for reliability and performance. At first, I assumed that almost no domains would be suitable for this, but I was surprised when I started looking at what I've written that actually is suitable for CDB.

The reliability of CDB depends on renaming a new file over the old database file, and I'm not sure if DJB's CDB calls fsync() or not. tinycdb does not. Make sure your filesystem can't lose your data with a rename before choosing CDB for reliability.

http://cr.yp.to/cdb.html
http://www.corpit.ru/mjt/tinycdb.html

Alternatives to SQL Databases

Posted Apr 17, 2009 15:13 UTC (Fri) by nlucas (guest, #33793) [Link]

SQLite could also be mentioned here because it's flexibility allows it.
If you don't mind losing data after a crash, disable that feature with "PRAGMA synchronous=OFF".
If you want a memory database, just open the ":memory:" database.
You can attach several databases and proceed as if it's just one.
You want even more control, use the "virtual tables" feature, where you can treat, for example, CVS files as tables, or even create a virtual table that connects to another database.
By implementing a new "VFS" layer you can change the low-level interface with the "disk", like add encryption, compression, make I/O using mmaped memory, whatever.
It's not a database for dummies. The "lite" in SQLite means you don't have a full blown SQL optimizer, so you need to do the work of actually optimizing the SQL queries beforehand. That is your job as a programmer, not SQLite.

PostRelational databases

Posted Apr 23, 2009 16:46 UTC (Thu) by Wol (subscriber, #4433) [Link]

If anybody notices a response to a story that's gone a bit stale...

At the end of the day, the problem with relational databases is C&D's 12 rules. You CANNOT have an efficient relational database because the inefficiency is mandated by the rules.

Who says "data comes in rows and columns"? C&D. But what happens if it comes in three, or four, dimensions? Sure, you can MODEL that in two dimensions, but the modelling can get very expensive ...

The relational maths is great, but any relational engine is crippled by its adherence to the underlying (flawed) rules.

For example, I know I got into a spat on a previous story, but many cases with relational engines that require ACID, *don't* require ACID on post-relational because they're inherently atomic.

Cheers,
Wol

Alternatives to SQL Databases

Posted Apr 23, 2009 19:00 UTC (Thu) by jbellis (guest, #14804) [Link]

djc et al,

Cassandra is doing well in its reincarnation as an Apache Incubator project. I was added as a committer outside Facebook, and I'm working full time on Cassandra now for Rackspace. Patches are being reviewed and applied and we're making good progress towards an official release.

I wrote about why Cassandra was the best fit for our needs here: http://spyced.blogspot.com/2009/03/why-i-like-cassandra.html