Cloud App Architecture Tips: October 2014

Saturday, October 25, 2014

How to configure WAS Liberty to use Apache Wink and Jackson?

As we have seen in the previous post, IBM WebSphere Liberty comes with JAX-RS and JSON support. In this post, I will show you how to explicitly use Apache Wink for the JAX-RS runtime and use Jackson as the JSON provider instead of the default providers. The updated code can be found on my GitHub repository.

The first step is to update the maven dependencies to add Wink and Jackson like this:

<dependency>
  <groupId>org.apache.wink</groupId>
  <artifactId>wink-server</artifactId>
  <version>1.4</version>
</dependency>

<dependency>
  <groupId>com.fasterxml.jackson.jaxrs</groupId>
  <artifactId>jackson-jaxrs-json-provider</artifactId>
  <version>2.4.3</version>
</dependency>

Then you need to declare the servlet in your web.xml file. Instead of using the the WAS Liberty JAX-RS servlet, you just need to indicate the class name of the Apache Wink servlet.

<servlet>
  <description>JAX-RS Tools Generated - Do not modify</description>
  <servlet-name>JAX-RS Servlet</servlet-name>
  <servlet-class>org.apache.wink.server.internal.servlet.RestServlet</servlet-class>
  <init-param>
    <param-name>javax.ws.rs.Application</param-name>
    <param-value>com.mycloudtips.swagger.MctApplication</param-value>
  </init-param>
  <load-on-startup>1</load-on-startup>
  <enabled>true</enabled>
  <async-supported>false</async-supported>
</servlet>
<servlet-mapping>
  <servlet-name>JAX-RS Servlet</servlet-name>
  <url-pattern>/jaxrs/*</url-pattern>
</servlet-mapping>

And in the application class (MctApplication class) you need to add the Jackson provider.

    @Override
    public Set<Class<?>> getClasses() {
 Set<Class<?>> classes = new HashSet<Class<?>>();

 classes.add(ApiDeclarationProvider.class);
 classes.add(ResourceListingProvider.class);
 classes.add(ApiListingResourceJSON.class);

 classes.add(JacksonJsonProvider.class);
  
 return classes;
    }

Finally, make sure you remove the feature jaxrs-1.1 from your server.xml and replace it by a simple servlet-3.0. That's it, esay peasy.

Monday, October 20, 2014

How to document your JAX-RS API using Swagger, WAS Liberty Profile and Bluemix?

Swagger has become the de facto standard for REST API documentation. It is also a pretty generic framework and developers need to know how to configure their specific environment. In this post, I will review the steps required to document a JAX-RS API developed with IBM WebSphere Application Server Liberty Profile. The complete example is available on my GitHub repository.

I will assume that you have created a Maven Dynamic Web project in Eclipse (project name and web context root are set to 'swagger-liberty'), and that you have defined a WAS Liberty server environment. Setting up your environment is outside the scope of this post, but you can find more information here.

In order to develop and document your JAX-RS API, you will need to follow these steps:

Declare the required maven dependencies.
Declare the JAX-RS and Swagger servlets.
Declare the Swagger JAX-RS providers and your JAX-RS resources.
Implement and document your APIs using Java annotations.
Copy the Swagger UI web resource files.
Activate the JAX-RS feature of Liberty.
Test your server locally.

The first step is to add the maven dependencies to your maven project. You need to add the Swagger JAX-RS bridge, logging bridge and JEE 6 apis:

<dependency>
  <groupId>com.wordnik</groupId>
  <artifactId>swagger-jaxrs_2.10</artifactId>
  <version>1.3.10</version>
</dependency>
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-jdk14</artifactId>
  <version>1.7.7</version>
</dependency>
<dependency>
  <groupId>javax</groupId>
  <artifactId>javaee-web-api</artifactId>
  <version>6.0</version>
  <scope>provided</scope>
</dependency>

Then you need to declare the servlets in your web.xml file. The first servlet is used to indicate the JAX-RS runtime where to find your JAX-RS application.

<servlet>
  <description>JAX-RS Tools Generated - Do not modify</description>
  <servlet-name>JAX-RS Servlet</servlet-name>
  <servlet-class>com.ibm.websphere.jaxrs.server.IBMRestServlet</servlet-class>
  <init-param>
    <param-name>javax.ws.rs.Application</param-name>
    <param-value>com.mycloudtips.swagger.MctApplication</param-value>
  </init-param>
  <load-on-startup>1</load-on-startup>
  <enabled>true</enabled>
  <async-supported>false</async-supported>
</servlet>
<servlet-mapping>
  <servlet-name>JAX-RS Servlet</servlet-name>
  <url-pattern>/jaxrs/*</url-pattern>
</servlet-mapping>

The second servlet is used to configure the Swagger runtime and indicate where to find the API meta-data (the base path, which is made of the web root context and the JAX-RS servlet mapping).

<servlet>
  <servlet-name>DefaultJaxrsConfig</servlet-name>
  <servlet-class>com.wordnik.swagger.jaxrs.config.DefaultJaxrsConfig</servlet-class>
  <init-param>
	<param-name>api.version</param-name>
	<param-value>1.0.0</param-value>
  </init-param>
  <init-param>
     <param-name>swagger.api.basepath</param-name>
     <param-value>/swagger-liberty/jaxrs</param-value>
   </init-param>
   <load-on-startup>2</load-on-startup>
</servlet>

The application class (MctApplication class) is the place where you need to declare the JAX-RS Swagger providers and your JAX-RS resource (MctResource class). Note that I usually declare the resources as singletons so that they are not created at each request.

    @Override
    public Set<Class<?>> getClasses() {
	Set<Class<?>> classes = new HashSet<Class<?>>();

	classes.add(ApiDeclarationProvider.class);
	classes.add(ResourceListingProvider.class);
	classes.add(ApiListingResourceJSON.class);
	return classes;
    }
    @Override
    public Set<Object> getSingletons() {
	Set<Object> singletons = new HashSet<Object>();
	singletons.add(new MctResource());
	return singletons;
    }

The resource class is the place where you can develop and document your APIs. You need to use the JAX-RS and Swagger annotations. Here is an example to declare a method returning a list of books:

@GET
@ApiOperation(value = "Returns the list of books from the library.", 
              response = MctBook.class, responseContainer = "List")
@ApiResponses(value = { @ApiResponse(code = 200, message = "OK"),
	@ApiResponse(code = 500, message = "Internal error") })
public Collection<MctBook> getBooks() {
  return library.values();
}

The server will include the Swagger UI and you need to copy the web resources (index.html, o2c.html, sagger-ui.js, sawgger-ui.min.js, lib, images and css files and directories). You can find these files in the Swagger UI JAX-RS sample or in my GitHub repository. You also need to adjust a path in the index.html file to point to your API:

   $(function () {
      window.swaggerUi = new SwaggerUi({
      url: "/swagger-liberty/jaxrs/api-docs",
      ...
    });

At this point, your project should compile fine and you should get ready to test. Before doing so, you need to activate the JAX-RS support in Liberty. Remember that Liberty is very flexible and let you decide what features will be loaded. To do so, you need to add the jaxrs-1.1 feature in the server.xml file.

   <featureManager>
    	<feature>jaxrs-1.1</feature>
        <feature>localConnector-1.0</feature>
    </featureManager>

Finally, you can add your application to your server runtime and start it. You should then be able to access the Swagger UI:

http://localhost:9080/swagger-liberty/

As an optional step, not covered in this post, you can easily deploy this server to IBM Bluemix.

Friday, October 10, 2014

How to build a document archive with CouchDB?

Let's imagine you have a database where documents are created and deleted. But you need to keep a record of the documents that are deleted in an archive. How to setup such an archive with CouchDB? Well, by using the features of CouchDB, and more specifically the replication, it is actually pretty simple.

A CouchDB replication copies the new revisions of the documents from a source database to a target one. It also deletes the documents as they are deleted in the source. So if you do nothing, the target database will not be an archive but just a copy of the source database. The trick is to define a filter that will not propagate the deletion. Here is such as simple filter:

"filters": {
      "archiveFilter": "function(doc) {return !doc._deleted }"
     },

Note that you can customize this filter so that you archive only specific types of documents.

You can run the replication on demand or continuously. It will be generally better in this case to setup a continuous replication so that you keep your archive up-to-date automatically. The replication document will then look like the following:

{
  "_id": "myarchive",
  "source": {
    "url": "...source URL...",
    "headers": {
      "Authorization": "..token..."
    }
  },
  "target": {
    "url": "...target URL...",
    "headers": {
      "Authorization": "...token..."
    }
  },
  "continuous": true,
  "filter": "archive/archiveFilter"
}

What are the system views you should always create with CouchDB?

Once you have created a database, you will start using it from your application and monitoring it will become important. Of course, you can use some existing UI of Cloudant or Futon. However, it is good to have some simple views that will quickly help you detect a potential issue. So far, I have used 3 views to get:

the number of documents having conflicts,
the number of documents having deleted conflicts,
the number of documents by type.

Here is the design document to create these views:

   {
     "language": "javascript",
      "views":
      {
        "conflicts": {
          "map": "function(doc) { if (doc._conflicts)  emit(null, doc._rev) }"
        },"deleted_conflicts": {
          "map": "function(doc) { if (doc._deleted_conflicts)  emit(null, doc._rev) }"
        },"documents": {
          "reduce" : "_sum",
          "map": "function(doc) { if (doc.type)  { emit(doc.type, 1) } else { emit('other',1) }}"
        } 
     }
   }

Of course, replace the attribute type with the attribute you use to distinguish between documents.

Sunday, October 5, 2014

How to deal with CouchDB conflitcs?

With CouchDB, conflict is actually an overloaded term, and I would like to distinguish between four types of conflicts:

Document creation conflicts
Document update conflicts
Document replication conflicts
Document deletion replication conflicts

Document creation conflicts. When creating a new document with a specific ID, an error can be returned if the ID already exists. In this case, the HTTP error code 409 is returned with a JSON document as the following:

{"error":"conflict","reason":"Document update conflict."}

This situation could be caused by a design issue in your application, and you should log these errors to fix them. It could also be expected to control concurrency. For example, you may have several parts of you applications trying to create a document with a specific ID, and you know that only one will succeed. Finally, it could happen because the generated ID used by the application are not unique by design, in this case a good approach is just to try to create the document again with a new generated ID.

Document update conflicts. When updating a document you need to provide its ID and also the revision you are updating. If the revision is not the latest one when CouchDB tries to update the database, you will also get a conflict error. This is similar to the previous case, but this time it means that somebody updated the document since you got the revision you wanted to update. Again, it could highlight a design issue. It could also be done intentionally. For example, let's assume that a document represents a state and can take the values of A, B or C. If different parts of the application read the initial state A, and one wants to change it to B and the other to C, only the first one will be accepted. The update that failed could then read again the state and see if the transition still makes sense and resend it. Remember that CouchDB does not support transactions but having such a tool is useful to ensure some consistency. In some other cases, it could mean that different parts of you application tried to update different attributes of your documents. The update that failed should then read again the document, merge the update and send it again. In short, coping with these cases often means that the application code must be ready to retry the update. However, the retry should be done at the application logic level, and not blindly by getting the latest revision and sending the update again, because the logic of the update may not be valid anymore. Alternatively, you may want to decompose your documents with a finer granularity to avoid conflicts (see a previous post). Finally, there is a twist that could happen when using a cluster. If for some reason you got a revision from Node A, but update the document in Node B where the replication is late, you may get a conflict error as well. As your access is load balanced, you have little control over this. Globally, you always need to protect your updates with a retry mechanism.

Document replication conflicts. The replication conflicts are totally different because you will not get a conflict error. This happens when you use a cluster of CouchDB instances with multiple writer nodes. In this case, document updates could occur in any of the nodes concurrently and the master/master replication will then propagate the updates between nodes. Note that if your cluster is configured with a single writer and multiple readers, you will never face such issues. The behavior is slightly different if the update is a simple update or a delete.

In the case of a simple update, it means that when the replication occurred from node B to node A, the node A had already a new revision of the document. So the node A is in front of 2 conflicting revisions. CouchDB will pick one of the revisions as the winner, and will store the other one in the _conflicts attribute. The application has no control over the winning revision or how a merge could be attempted at the time of the replication. Resolving the conflict means to get the document, get all the conflicting revisions, merge the updates, save back the document and delete the discarded revisions. I see two approaches: you can try to resolve the conflicts when you read the document on the fly or by using a background process. This will probably be the subject of another post. Anyway, I recommend to setup a background process in your application server to monitor these conflicts, and you can query this view:

"conflicts": {
   "map": "function(doc) { if (doc['_conflicts'])  emit(null, doc._rev) }"
}

Document deletion replication conflicts. When one of the conflicting update is a delete, it is similar but the conflicting revision is stored in _deleted_conflicts attribute. It means that node A had changed the document while it was deleted in node B. However, the wining document is always the updated document and not the deleted one. As for me, this is a big concern because I think that most of the time, the delete operation should win. If you don't pay attention, some part of your application can delete documents and they will reappear after a replication, be aware... Fortunately, I was able to devise an approach to avoid this, by doing two things. First, you can exclude objects with deleted conflicts from all you views as if they did not exist. Here is an example to get all the orders:

"orders": {
   "map": "function(doc) { if (doc.type == 'order' && !doc['_deleted_conflicts'])  emit(null, doc._rev) }"
}

Then, you can setup a background process in your application server to get and delete any documents with the _deleted_conflicts attribute. This process will have to query the following view and then send a bulk delete.

"deleted_conflicts": {
   "map": "function(doc) { if (doc['_deleted_conflicts'])  emit(null, doc._rev) }"
}

To conclude, CouchDB has some nice features and properties but you have to pay the price of the complexity of conflict resolution.

Info