Sunday, October 5, 2014

How to deal with CouchDB conflitcs?

With CouchDB, conflict is actually an overloaded term, and I would like to distinguish between four types of conflicts:

  • Document creation conflicts
  • Document update conflicts
  • Document replication conflicts
  • Document deletion replication conflicts

Document creation conflicts. When creating a new document with a specific ID, an error can be returned if the ID already exists. In this case, the HTTP error code 409 is returned with a JSON document as the following:

{"error":"conflict","reason":"Document update conflict."}
This situation could be caused by a design issue in your application, and you should log these errors to fix them. It could also be expected to control concurrency. For example, you may have several parts of you applications trying to create a document with a specific ID, and you know that only one will succeed. Finally, it could happen because the generated ID used by the application are not unique by design, in this case a good approach is just to try to create the document again with a new generated ID.

Document update conflicts. When updating a document you need to provide its ID and also the revision you are updating. If the revision is not the latest one when CouchDB tries to update the database, you will also get a conflict error. This is similar to the previous case, but this time it means that somebody updated the document since you got the revision you wanted to update. Again, it could highlight a design issue. It could also be done intentionally. For example, let's assume that a document represents a state and can take the values of A, B or C. If different parts of the application read the initial state A, and one wants to change it to B and the other to C, only the first one will be accepted. The update that failed could then read again the state and see if the transition still makes sense and resend it. Remember that CouchDB does not support transactions but having such a tool is useful to ensure some consistency. In some other cases, it could mean that different parts of you application tried to update different attributes of your documents. The update that failed should then read again the document, merge the update and send it again. In short, coping with these cases often means that the application code must be ready to retry the update. However, the retry should be done at the application logic level, and not blindly by getting the latest revision and sending the update again, because the logic of the update may not be valid anymore. Alternatively, you may want to decompose your documents with a finer granularity to avoid conflicts (see a previous post). Finally, there is a twist that could happen when using a cluster. If for some reason you got a revision from Node A, but update the document in Node B where the replication is late, you may get a conflict error as well. As your access is load balanced, you have little control over this. Globally, you always need to protect your updates with a retry mechanism.

Document replication conflicts. The replication conflicts are totally different because you will not get a conflict error. This happens when you use a cluster of CouchDB instances with multiple writer nodes. In this case, document updates could occur in any of the nodes concurrently and the master/master replication will then propagate the updates between nodes. Note that if your cluster is configured with a single writer and multiple readers, you will never face such issues. The behavior is slightly different if the update is a simple update or a delete.

In the case of a simple update, it means that when the replication occurred from node B to node A, the node A had already a new revision of the document. So the node A is in front of 2 conflicting revisions. CouchDB will pick one of the revisions as the winner, and will store the other one in the _conflicts attribute. The application has no control over the winning revision or how a merge could be attempted at the time of the replication. Resolving the conflict means to get the document, get all the conflicting revisions, merge the updates, save back the document and delete the discarded revisions. I see two approaches: you can try to resolve the conflicts when you read the document on the fly or by using a background process. This will probably be the subject of another post. Anyway, I recommend to setup a background process in your application server to monitor these conflicts, and you can query this view:

"conflicts": {
   "map": "function(doc) { if (doc['_conflicts'])  emit(null, doc._rev) }"
}

Document deletion replication conflicts. When one of the conflicting update is a delete, it is similar but the conflicting revision is stored in _deleted_conflicts attribute. It means that node A had changed the document while it was deleted in node B. However, the wining document is always the updated document and not the deleted one. As for me, this is a big concern because I think that most of the time, the delete operation should win. If you don't pay attention, some part of your application can delete documents and they will reappear after a replication, be aware... Fortunately, I was able to devise an approach to avoid this, by doing two things. First, you can exclude objects with deleted conflicts from all you views as if they did not exist. Here is an example to get all the orders:

"orders": {
   "map": "function(doc) { if (doc.type == 'order' && !doc['_deleted_conflicts'])  emit(null, doc._rev) }"
}
Then, you can setup a background process in your application server to get and delete any documents with the _deleted_conflicts attribute. This process will have to query the following view and then send a bulk delete.
"deleted_conflicts": {
   "map": "function(doc) { if (doc['_deleted_conflicts'])  emit(null, doc._rev) }"
}

To conclude, CouchDB has some nice features and properties but you have to pay the price of the complexity of conflict resolution.

No comments:

Post a Comment