Tuesday, December 16, 2014

How to deal with replication conflicts in CouchDB?

In a previous post I introduced the different types of conflicts in CouchDB: creation conflicts, update conflicts, replication conflicts and deletion replication conflicts. This time, I share more details about the replication conflicts and how I recommend to resolve them.

A replication conflict occurs when the same document is updated in nodes A and B at the same time while the replication between A and B has not been fully processed. Then, when the replication actually occurs, the nodes will have 2 revisions of the same document. CouchDB will pick one of the revisions as the winner, and will store the other one in the _conflicts attribute. CouchDB has an algorithm ensuring that the same revision will be picked by all the nodes. This situation can occur in a cluster when the rate of modifications of the document is higher than the throughput of the replication.

The application has no control over the winning revision or how merging could be attempted at the time of the replication. Resolving the conflict means to get the document, get all the conflicting revisions, merge the updates, save back the document and delete the discarded revisions. At least this is what is explained in wiki and in the Cloudant documentation. You can do this either on the fly when accessing documents or using a background process or both.

However, there is a problem. When deleting the discarded revisions, these revisions are then added to the _deleted_conflicts array. And, as I explained before, this field is also very useful to implement a simple management of delete conflicts so that delete always wins. But if you do so, the merged documents should be considered deleted and that's not what you want... Actually, there is no way to distinguish between a real deletion and a deletion due to a conflict resolution. Very few people mentioned this issue (ref1, ref2).

One solution that was proposed is to add a new attribute to flag document deleted by the application in addition to _deleted. This way, you can perform the merge as explain above. However, you need to update a lot of logic to handle this new flag. When you read a document that has a _deleted_conflicts you need to get all the revisions to know if one of them was a real delete where the flag was set. If it was set, then you should delete the document otherwise you need to continue, the problem is that you need to check this all the time for documents that were just merged.

I would like to propose another solution by adding a new collection of merged revisions to the document. With this approach you keep track of revisions you have merged, and you do not delete them. Each time you merge a revision, just add it to the merged ones. With this, you can keep the simple deletion process to delete any document that has a _deleted_conflicts. You also don't need to lose time fetching previous revisions all the time, because if the the revisions from _conflicts are already in the merged list, then there is nothing to do. The only drawback is probably that revisions in conflicts cannot be purged by the compaction, but if conflicts are rare and if you eventually delete your documents in the application, then that's not a big problem.

So the recipe is the following:

  • Implement a function to merge a list of documents of the same type.
  • When fetching a document, always set the query parameter conflicts=true, and if the returned _conflicts contains revisions that have not been merged yet, merge them, add them to the merged list and save the document before returning it.
  • When accessing list of documents, always set the query parameter conflicts=true, and merge the documents as explained above if necessary.
  • In the background, implement a process that will identify documents to merge, and merge them as explained above. You need to do that because some documents may not be accessed, and will not be merged on the fly, but still used in views or other aggregations. To identify the list of documents to merge, just create a view that emits the document only if there is at least one revision listed in _conflicts and not in the merged list.