Wednesday, September 24, 2014

What is the best granularity of CouchDB databases?

After the granularity of documents that I covered in my previous post, the typical next question is about organizing the various document types. Is it better to store documents in different databases or group the documents in a single one?

CouchDB does not have a built-in notion of document type. As a comparison, you can organize documents by collections in MongoDB and you have basic functions to manage your collections (insert, remove, find, drop etc). There is nothing like it with CouchDB. A CouchDB database is able to contain any document of any shape or form. You then need to use views to access a subset of documents, and you can use any criteria. However, a commonly used pattern is to make sure all the documents have a "type" attribute. You can then use the type to create views to return a subset of the documents. For example, if you have documents for orders and line items, you can access them by type using this design document:

   {
     "language": "javascript",
      "views":
      {
        "orders": {
          "map": "function(doc) { if (doc.type == 'order')  emit(null, doc._rev) }"
        },
       "items": {
         "map": "function(doc) { if (doc.type == 'item')  emit(null, doc._rev) }"
       }
   }

When storing different document types in a single database, you also need to make sure that document ids cannot be in conflict. One approach that I recommend is to use the document type as a prefix of the document id. Deleting all or a subset of the documents of a given type can be achieved using the bulk API by sending the document ids, their revisions and set the _deleted attribute to true. However, it requires to get the the complete list of document ids and revisions before deleting, and the bulk update can fail due to conflicts.

On the other hand, you can create new databases and store different document types in each. The major drawback of this approach is that you cannot create views across databases. For example, if you want to list all orders and items of a given customer, using different databases for orders and items will be a problem. You may also want to setup replication later on, and you will need to create different replications and monitor more things. But if the data is really not related, using different databases is a nice option. Based on other posts CouchDB should handle a large number of databases even if some configuration may be necessary (see this post).

The question of database granularity can also come with the multi-tenancy requirement. Is it better to create a single database shared by tenants or to create a database per tenant. I will distinguish between two multi-tenancy use cases:

  • (a) lot of the infrastructure is shared between tenants and you need to monitor the activity of tenants globally. In this case, using a single database is better, and you need to make sure that all documents have a tenant attribute.
  • (b) each tenant needs to store data of its own and you do not need to aggregate data from different tenants in reports. In this case, using a database per tenant is better.

In conclusion, I recommend to use a single database to store various document types so that you can create views to manage your data with more flexibility. If your concern is about multi-tenancy, use a single database or a database per tenant depending on the need to aggregate data from different tenants.

No comments:

Post a Comment