Tuesday, September 30, 2014

How to generate CouchDB document ID?

A CouchDB database is just a bag of documents and you need to make sure the document IDs are unique. But this is not the only requirement, the ID should be quick to generate, efficiently used by CouchDB, as short as possible, and provide useful information in logs or monitoring tools.

The first idea is typically to use UUID. You can generate UUIDs using your host programming language or ask CouchDB to generate one or more UUID for you using the _uuids resource:

GET https://xxx.cloudant.com/_uuids?count=5
{"uuids":["648961210dab8fdffac52cc2f28e143e",
          "648961210dab8fdffac52cc2f28e200f",
          "648961210dab8fdffac52cc2f28e2d2e",
          "648961210dab8fdffac52cc2f28e3263",
          "648961210dab8fdffac52cc2f28e3997"]}
Then, you can create the document using a PUT request with the UUID specified in the URI:
Request:
PUT https://xxx.cloudant.com/blogdb/648961210dab8fdffac52cc2f28e143e
{ "customer" : "c1" ...}

Response:
{"ok":true,"id":"648961210dab8fdffac52cc2f28e143e","rev":"1-9f1fc712b431b44ec6cf09369183a96b"}
Note than if the ID already exists, a conflict is returned:
{"error":"conflict","reason":"Document update conflict."}
Alternatively, you can let CouchDB generate the UUID for you by using a POST request, and the id will be returned:
Request:
POST https://xxx.cloudant.com/blogdb/
{ "customer" : "c1" ...}

Response:
{"ok":true,"id":"f32fee7ca6ce5a755900525f6c87f346","rev":"1-acea01cbb45b0d08d9f534f9651ef7b1"}
UUID are very opaque and this is good in some cases. However, it does not help when you look at logs or lists of objects to know what object you are referencing, especially when your database has different types of documents, they will be all mixed up. Also, UUID may not be the best choice depending on the algorithm used to let CouchDB update the B-Tree indexes (see comments and some tests).

Finally, generating sequence numbers such as 1,2,3... is a generally difficult in a distributed environment as this requires some synchronization. This is usually nice for end users but not a good practice for scalable implementation.

With this in mind, I recommend to create document id as follows but this depends on your application and performance needs:

  • Include the document type
  • Include related identifier such as user id or other document id
  • Include a timestamp such as the number of milliseconds
For example, the following id meets my requirements so far: order.1SXDGF.1412020886716. I should add some performance benchmarks later.

No comments:

Post a Comment