Categories

When to Use CouchDB

This article is based on George Palmer’s talk at the Rails Underground Conference, last week. George is the developer of the couch_foo plugin, which allows to interact with CouchDB from Ruby with an ActiveRecord style.

A CouchDB database is a collection of documents that represent objects with simple named fields. Documents are stored as JSON, and subsets of documents are handled via views. Views are dynamically built, and they are used for aggregating and reporting on the documents in a database. Different than schema based SQL databases, it works on semi-structured, document oriented data, which is the kind of data that most of collaborative web applications use. This structure allows adding new document types alongside the old. It has a peer based distributed architecture, which can use multiple CouchDB hosts having independent “replica copies” of the same database.

REST can be used as an interface to view the stored files. Using REST has some advantages like load balancing and caching. When we “view” a subset of documents, these views are saved as “_design/…” This document explains how to get the result set that we want to view.

A simple view function that shows the documents with a type of “van” is like this:

function(doc){

if(doc.Type == "van"){

emit(doc.Name, doc);

}

}

and this should return the documents with the “van” type in a JSON structure. The first parameter of the emit function should be the key value of the document, which is in this case the name property of the document. emit works by just storing the key/value pairs in an array and then, when all views in the same _design document have been calculated, returns all results at once

Associations in I querywere described with an example.

function(doc){

if( doc.type == "post"){

map([doc._id, 0], doc);

} else if (doc.type == "comment"){

map([doc.post, 1], doc);

}

If we think of a typical set of blog documents, where each post may be associated with many comments, with this function, I emit all posts with the value 0, and set the document id as the key value, and all comments are emited with the value 1, and the posts they belong to are set as their key values. When we view this setting by entering “_view/post_comments/all” to the browser, we have the following view:

"key":["1",0],  "value":{"_id":"1",  "type":"post",
"text":"My Blog Post"}
"key":["2",0],  "value":{"_id":"2",  "type":"post",
"text":"My 2nd Blog Post"}
"key":l"3",0], "value":{"_id":"3", "type":"post",
"text":"My 3rd Blog Post"}
"key":["l",l],  "value":{"_id":"3",  "type":"comment",
"text":"You rock dude", "post":"1")
"key":["2",l],  "value":{"_id":"3",  "type":"comment",
"text":"Han you suck", "post":"2"}

Here, we can see al the documents in JSON format, and associations of posts with relevant comments. If we want to see ony the post 1 and up to 2 comments associated with it, we should enter the following parameters: /all?startkey=["1"]&endkey=["1",2]

In Relational databases, the cost is generally paid at the point of insertion. With CouchDB, the cost is paid when checking a view for the first time. This means that, if the application is writing new entries to the database frequently, that might slow down the application.

CouchDB uses the _rev field in each document for conflict management. This field is used to determine the winning document when a conflict occurs.

George talked about FriendFeed as an example where using a schemaless database is a better solution. They moved to a schemaless structure on MySQL with storing only simple key/value pairs, because after some time it was too time consuming to build indexes with the amount of data stored in the DB.

The second example was one of George”s own project called 5ft Shelf. He explained how his initial DB schema gradually got highly complex. The following picture shows the difference between 2 approaches. The 2nd schema is for the CouchDB approach.

5ftshelf1 5ftshelf2

Places where CouchDB is not likely to be the best solution are when there needs to be fixed definitions and stored objects are unlikely to change, like estate agency applications where house properties have very strict and non-changing values, or financial applications that have very strict definitions.

The slides from the talk are here: http://www.slideshare.net/Georgio_1999/couch-foo-couchdb-on-rails

Sample chapters from the upcoming “CouchDb: The Complete Definition” book by O’Reilly: http://books.couchdb.org/relax/

George’s blog is: http://www.rowtheboat.com/

Video of the talk can be seen here: http://skillsmatter.com/podcast/ajax-ria/george-palmer-spending-more-time-on-the-couch

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>