Tuesday, December 16, 2014

How to deal with replication conflicts in CouchDB?

In a previous post I introduced the different types of conflicts in CouchDB: creation conflicts, update conflicts, replication conflicts and deletion replication conflicts. This time, I share more details about the replication conflicts and how I recommend to resolve them.

A replication conflict occurs when the same document is updated in nodes A and B at the same time while the replication between A and B has not been fully processed. Then, when the replication actually occurs, the nodes will have 2 revisions of the same document. CouchDB will pick one of the revisions as the winner, and will store the other one in the _conflicts attribute. CouchDB has an algorithm ensuring that the same revision will be picked by all the nodes. This situation can occur in a cluster when the rate of modifications of the document is higher than the throughput of the replication.

The application has no control over the winning revision or how merging could be attempted at the time of the replication. Resolving the conflict means to get the document, get all the conflicting revisions, merge the updates, save back the document and delete the discarded revisions. At least this is what is explained in wiki and in the Cloudant documentation. You can do this either on the fly when accessing documents or using a background process or both.

However, there is a problem. When deleting the discarded revisions, these revisions are then added to the _deleted_conflicts array. And, as I explained before, this field is also very useful to implement a simple management of delete conflicts so that delete always wins. But if you do so, the merged documents should be considered deleted and that's not what you want... Actually, there is no way to distinguish between a real deletion and a deletion due to a conflict resolution. Very few people mentioned this issue (ref1, ref2).

One solution that was proposed is to add a new attribute to flag document deleted by the application in addition to _deleted. This way, you can perform the merge as explain above. However, you need to update a lot of logic to handle this new flag. When you read a document that has a _deleted_conflicts you need to get all the revisions to know if one of them was a real delete where the flag was set. If it was set, then you should delete the document otherwise you need to continue, the problem is that you need to check this all the time for documents that were just merged.

I would like to propose another solution by adding a new collection of merged revisions to the document. With this approach you keep track of revisions you have merged, and you do not delete them. Each time you merge a revision, just add it to the merged ones. With this, you can keep the simple deletion process to delete any document that has a _deleted_conflicts. You also don't need to lose time fetching previous revisions all the time, because if the the revisions from _conflicts are already in the merged list, then there is nothing to do. The only drawback is probably that revisions in conflicts cannot be purged by the compaction, but if conflicts are rare and if you eventually delete your documents in the application, then that's not a big problem.

So the recipe is the following:

  • Implement a function to merge a list of documents of the same type.
  • When fetching a document, always set the query parameter conflicts=true, and if the returned _conflicts contains revisions that have not been merged yet, merge them, add them to the merged list and save the document before returning it.
  • When accessing list of documents, always set the query parameter conflicts=true, and merge the documents as explained above if necessary.
  • In the background, implement a process that will identify documents to merge, and merge them as explained above. You need to do that because some documents may not be accessed, and will not be merged on the fly, but still used in views or other aggregations. To identify the list of documents to merge, just create a view that emits the document only if there is at least one revision listed in _conflicts and not in the merged list.

Saturday, October 25, 2014

How to configure WAS Liberty to use Apache Wink and Jackson?

As we have seen in the previous post, IBM WebSphere Liberty comes with JAX-RS and JSON support. In this post, I will show you how to explicitly use Apache Wink for the JAX-RS runtime and use Jackson as the JSON provider instead of the default providers. The updated code can be found on my GitHub repository.

The first step is to update the maven dependencies to add Wink and Jackson like this:

<dependency>
  <groupId>org.apache.wink</groupId>
  <artifactId>wink-server</artifactId>
  <version>1.4</version>
</dependency>

<dependency>
  <groupId>com.fasterxml.jackson.jaxrs</groupId>
  <artifactId>jackson-jaxrs-json-provider</artifactId>
  <version>2.4.3</version>
</dependency>

Then you need to declare the servlet in your web.xml file. Instead of using the the WAS Liberty JAX-RS servlet, you just need to indicate the class name of the Apache Wink servlet.

<servlet>
  <description>JAX-RS Tools Generated - Do not modify</description>
  <servlet-name>JAX-RS Servlet</servlet-name>
  <servlet-class>org.apache.wink.server.internal.servlet.RestServlet</servlet-class>
  <init-param>
    <param-name>javax.ws.rs.Application</param-name>
    <param-value>com.mycloudtips.swagger.MctApplication</param-value>
  </init-param>
  <load-on-startup>1</load-on-startup>
  <enabled>true</enabled>
  <async-supported>false</async-supported>
</servlet>
<servlet-mapping>
  <servlet-name>JAX-RS Servlet</servlet-name>
  <url-pattern>/jaxrs/*</url-pattern>
</servlet-mapping>

And in the application class (MctApplication class) you need to add the Jackson provider.

    @Override
    public Set<Class<?>> getClasses() {
 Set<Class<?>> classes = new HashSet<Class<?>>();

 classes.add(ApiDeclarationProvider.class);
 classes.add(ResourceListingProvider.class);
 classes.add(ApiListingResourceJSON.class);

 classes.add(JacksonJsonProvider.class);
  
 return classes;
    }

Finally, make sure you remove the feature jaxrs-1.1 from your server.xml and replace it by a simple servlet-3.0. That's it, esay peasy.

Monday, October 20, 2014

How to document your JAX-RS API using Swagger, WAS Liberty Profile and Bluemix?

Swagger has become the de facto standard for REST API documentation. It is also a pretty generic framework and developers need to know how to configure their specific environment. In this post, I will review the steps required to document a JAX-RS API developed with IBM WebSphere Application Server Liberty Profile. The complete example is available on my GitHub repository.

I will assume that you have created a Maven Dynamic Web project in Eclipse (project name and web context root are set to 'swagger-liberty'), and that you have defined a WAS Liberty server environment. Setting up your environment is outside the scope of this post, but you can find more information here.

In order to develop and document your JAX-RS API, you will need to follow these steps:

  • Declare the required maven dependencies.
  • Declare the JAX-RS and Swagger servlets.
  • Declare the Swagger JAX-RS providers and your JAX-RS resources.
  • Implement and document your APIs using Java annotations.
  • Copy the Swagger UI web resource files.
  • Activate the JAX-RS feature of Liberty.
  • Test your server locally.

The first step is to add the maven dependencies to your maven project. You need to add the Swagger JAX-RS bridge, logging bridge and JEE 6 apis:

<dependency>
  <groupId>com.wordnik</groupId>
  <artifactId>swagger-jaxrs_2.10</artifactId>
  <version>1.3.10</version>
</dependency>
<dependency>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-jdk14</artifactId>
  <version>1.7.7</version>
</dependency>
<dependency>
  <groupId>javax</groupId>
  <artifactId>javaee-web-api</artifactId>
  <version>6.0</version>
  <scope>provided</scope>
</dependency>

Then you need to declare the servlets in your web.xml file. The first servlet is used to indicate the JAX-RS runtime where to find your JAX-RS application.

<servlet>
  <description>JAX-RS Tools Generated - Do not modify</description>
  <servlet-name>JAX-RS Servlet</servlet-name>
  <servlet-class>com.ibm.websphere.jaxrs.server.IBMRestServlet</servlet-class>
  <init-param>
    <param-name>javax.ws.rs.Application</param-name>
    <param-value>com.mycloudtips.swagger.MctApplication</param-value>
  </init-param>
  <load-on-startup>1</load-on-startup>
  <enabled>true</enabled>
  <async-supported>false</async-supported>
</servlet>
<servlet-mapping>
  <servlet-name>JAX-RS Servlet</servlet-name>
  <url-pattern>/jaxrs/*</url-pattern>
</servlet-mapping>
The second servlet is used to configure the Swagger runtime and indicate where to find the API meta-data (the base path, which is made of the web root context and the JAX-RS servlet mapping).
<servlet>
  <servlet-name>DefaultJaxrsConfig</servlet-name>
  <servlet-class>com.wordnik.swagger.jaxrs.config.DefaultJaxrsConfig</servlet-class>
  <init-param>
	<param-name>api.version</param-name>
	<param-value>1.0.0</param-value>
  </init-param>
  <init-param>
     <param-name>swagger.api.basepath</param-name>
     <param-value>/swagger-liberty/jaxrs</param-value>
   </init-param>
   <load-on-startup>2</load-on-startup>
</servlet>

The application class (MctApplication class) is the place where you need to declare the JAX-RS Swagger providers and your JAX-RS resource (MctResource class). Note that I usually declare the resources as singletons so that they are not created at each request.

    @Override
    public Set<Class<?>> getClasses() {
	Set<Class<?>> classes = new HashSet<Class<?>>();

	classes.add(ApiDeclarationProvider.class);
	classes.add(ResourceListingProvider.class);
	classes.add(ApiListingResourceJSON.class);
	return classes;
    }
    @Override
    public Set<Object> getSingletons() {
	Set<Object> singletons = new HashSet<Object>();
	singletons.add(new MctResource());
	return singletons;
    }

The resource class is the place where you can develop and document your APIs. You need to use the JAX-RS and Swagger annotations. Here is an example to declare a method returning a list of books:

@GET
@ApiOperation(value = "Returns the list of books from the library.", 
              response = MctBook.class, responseContainer = "List")
@ApiResponses(value = { @ApiResponse(code = 200, message = "OK"),
	@ApiResponse(code = 500, message = "Internal error") })
public Collection<MctBook> getBooks() {
  return library.values();
}

The server will include the Swagger UI and you need to copy the web resources (index.html, o2c.html, sagger-ui.js, sawgger-ui.min.js, lib, images and css files and directories). You can find these files in the Swagger UI JAX-RS sample or in my GitHub repository. You also need to adjust a path in the index.html file to point to your API:

   $(function () {
      window.swaggerUi = new SwaggerUi({
      url: "/swagger-liberty/jaxrs/api-docs",
      ...
    });

At this point, your project should compile fine and you should get ready to test. Before doing so, you need to activate the JAX-RS support in Liberty. Remember that Liberty is very flexible and let you decide what features will be loaded. To do so, you need to add the jaxrs-1.1 feature in the server.xml file.

   <featureManager>
    	<feature>jaxrs-1.1</feature>
        <feature>localConnector-1.0</feature>
    </featureManager>

Finally, you can add your application to your server runtime and start it. You should then be able to access the Swagger UI:

http://localhost:9080/swagger-liberty/
As an optional step, not covered in this post, you can easily deploy this server to IBM Bluemix.

Friday, October 10, 2014

How to build a document archive with CouchDB?

Let's imagine you have a database where documents are created and deleted. But you need to keep a record of the documents that are deleted in an archive. How to setup such an archive with CouchDB? Well, by using the features of CouchDB, and more specifically the replication, it is actually pretty simple.

A CouchDB replication copies the new revisions of the documents from a source database to a target one. It also deletes the documents as they are deleted in the source. So if you do nothing, the target database will not be an archive but just a copy of the source database. The trick is to define a filter that will not propagate the deletion. Here is such as simple filter:

"filters": {
      "archiveFilter": "function(doc) {return !doc._deleted }"
     },
Note that you can customize this filter so that you archive only specific types of documents.

You can run the replication on demand or continuously. It will be generally better in this case to setup a continuous replication so that you keep your archive up-to-date automatically. The replication document will then look like the following:

{
  "_id": "myarchive",
  "source": {
    "url": "...source URL...",
    "headers": {
      "Authorization": "..token..."
    }
  },
  "target": {
    "url": "...target URL...",
    "headers": {
      "Authorization": "...token..."
    }
  },
  "continuous": true,
  "filter": "archive/archiveFilter"
}

What are the system views you should always create with CouchDB?

Once you have created a database, you will start using it from your application and monitoring it will become important. Of course, you can use some existing UI of Cloudant or Futon. However, it is good to have some simple views that will quickly help you detect a potential issue. So far, I have used 3 views to get:

  • the number of documents having conflicts,
  • the number of documents having deleted conflicts,
  • the number of documents by type.

Here is the design document to create these views:

   {
     "language": "javascript",
      "views":
      {
        "conflicts": {
          "map": "function(doc) { if (doc._conflicts)  emit(null, doc._rev) }"
        },"deleted_conflicts": {
          "map": "function(doc) { if (doc._deleted_conflicts)  emit(null, doc._rev) }"
        },"documents": {
          "reduce" : "_sum",
          "map": "function(doc) { if (doc.type)  { emit(doc.type, 1) } else { emit('other',1) }}"
        } 
     }
   }
Of course, replace the attribute type with the attribute you use to distinguish between documents.

Sunday, October 5, 2014

How to deal with CouchDB conflitcs?

With CouchDB, conflict is actually an overloaded term, and I would like to distinguish between four types of conflicts:

  • Document creation conflicts
  • Document update conflicts
  • Document replication conflicts
  • Document deletion replication conflicts

Document creation conflicts. When creating a new document with a specific ID, an error can be returned if the ID already exists. In this case, the HTTP error code 409 is returned with a JSON document as the following:

{"error":"conflict","reason":"Document update conflict."}
This situation could be caused by a design issue in your application, and you should log these errors to fix them. It could also be expected to control concurrency. For example, you may have several parts of you applications trying to create a document with a specific ID, and you know that only one will succeed. Finally, it could happen because the generated ID used by the application are not unique by design, in this case a good approach is just to try to create the document again with a new generated ID.

Document update conflicts. When updating a document you need to provide its ID and also the revision you are updating. If the revision is not the latest one when CouchDB tries to update the database, you will also get a conflict error. This is similar to the previous case, but this time it means that somebody updated the document since you got the revision you wanted to update. Again, it could highlight a design issue. It could also be done intentionally. For example, let's assume that a document represents a state and can take the values of A, B or C. If different parts of the application read the initial state A, and one wants to change it to B and the other to C, only the first one will be accepted. The update that failed could then read again the state and see if the transition still makes sense and resend it. Remember that CouchDB does not support transactions but having such a tool is useful to ensure some consistency. In some other cases, it could mean that different parts of you application tried to update different attributes of your documents. The update that failed should then read again the document, merge the update and send it again. In short, coping with these cases often means that the application code must be ready to retry the update. However, the retry should be done at the application logic level, and not blindly by getting the latest revision and sending the update again, because the logic of the update may not be valid anymore. Alternatively, you may want to decompose your documents with a finer granularity to avoid conflicts (see a previous post). Finally, there is a twist that could happen when using a cluster. If for some reason you got a revision from Node A, but update the document in Node B where the replication is late, you may get a conflict error as well. As your access is load balanced, you have little control over this. Globally, you always need to protect your updates with a retry mechanism.

Document replication conflicts. The replication conflicts are totally different because you will not get a conflict error. This happens when you use a cluster of CouchDB instances with multiple writer nodes. In this case, document updates could occur in any of the nodes concurrently and the master/master replication will then propagate the updates between nodes. Note that if your cluster is configured with a single writer and multiple readers, you will never face such issues. The behavior is slightly different if the update is a simple update or a delete.

In the case of a simple update, it means that when the replication occurred from node B to node A, the node A had already a new revision of the document. So the node A is in front of 2 conflicting revisions. CouchDB will pick one of the revisions as the winner, and will store the other one in the _conflicts attribute. The application has no control over the winning revision or how a merge could be attempted at the time of the replication. Resolving the conflict means to get the document, get all the conflicting revisions, merge the updates, save back the document and delete the discarded revisions. I see two approaches: you can try to resolve the conflicts when you read the document on the fly or by using a background process. This will probably be the subject of another post. Anyway, I recommend to setup a background process in your application server to monitor these conflicts, and you can query this view:

"conflicts": {
   "map": "function(doc) { if (doc['_conflicts'])  emit(null, doc._rev) }"
}

Document deletion replication conflicts. When one of the conflicting update is a delete, it is similar but the conflicting revision is stored in _deleted_conflicts attribute. It means that node A had changed the document while it was deleted in node B. However, the wining document is always the updated document and not the deleted one. As for me, this is a big concern because I think that most of the time, the delete operation should win. If you don't pay attention, some part of your application can delete documents and they will reappear after a replication, be aware... Fortunately, I was able to devise an approach to avoid this, by doing two things. First, you can exclude objects with deleted conflicts from all you views as if they did not exist. Here is an example to get all the orders:

"orders": {
   "map": "function(doc) { if (doc.type == 'order' && !doc['_deleted_conflicts'])  emit(null, doc._rev) }"
}
Then, you can setup a background process in your application server to get and delete any documents with the _deleted_conflicts attribute. This process will have to query the following view and then send a bulk delete.
"deleted_conflicts": {
   "map": "function(doc) { if (doc['_deleted_conflicts'])  emit(null, doc._rev) }"
}

To conclude, CouchDB has some nice features and properties but you have to pay the price of the complexity of conflict resolution.

Tuesday, September 30, 2014

How to generate CouchDB document ID?

A CouchDB database is just a bag of documents and you need to make sure the document IDs are unique. But this is not the only requirement, the ID should be quick to generate, efficiently used by CouchDB, as short as possible, and provide useful information in logs or monitoring tools.

The first idea is typically to use UUID. You can generate UUIDs using your host programming language or ask CouchDB to generate one or more UUID for you using the _uuids resource:

GET https://xxx.cloudant.com/_uuids?count=5
{"uuids":["648961210dab8fdffac52cc2f28e143e",
          "648961210dab8fdffac52cc2f28e200f",
          "648961210dab8fdffac52cc2f28e2d2e",
          "648961210dab8fdffac52cc2f28e3263",
          "648961210dab8fdffac52cc2f28e3997"]}
Then, you can create the document using a PUT request with the UUID specified in the URI:
Request:
PUT https://xxx.cloudant.com/blogdb/648961210dab8fdffac52cc2f28e143e
{ "customer" : "c1" ...}

Response:
{"ok":true,"id":"648961210dab8fdffac52cc2f28e143e","rev":"1-9f1fc712b431b44ec6cf09369183a96b"}
Note than if the ID already exists, a conflict is returned:
{"error":"conflict","reason":"Document update conflict."}
Alternatively, you can let CouchDB generate the UUID for you by using a POST request, and the id will be returned:
Request:
POST https://xxx.cloudant.com/blogdb/
{ "customer" : "c1" ...}

Response:
{"ok":true,"id":"f32fee7ca6ce5a755900525f6c87f346","rev":"1-acea01cbb45b0d08d9f534f9651ef7b1"}
UUID are very opaque and this is good in some cases. However, it does not help when you look at logs or lists of objects to know what object you are referencing, especially when your database has different types of documents, they will be all mixed up. Also, UUID may not be the best choice depending on the algorithm used to let CouchDB update the B-Tree indexes (see comments and some tests).

Finally, generating sequence numbers such as 1,2,3... is a generally difficult in a distributed environment as this requires some synchronization. This is usually nice for end users but not a good practice for scalable implementation.

With this in mind, I recommend to create document id as follows but this depends on your application and performance needs:

  • Include the document type
  • Include related identifier such as user id or other document id
  • Include a timestamp such as the number of milliseconds
For example, the following id meets my requirements so far: order.1SXDGF.1412020886716. I should add some performance benchmarks later.

Wednesday, September 24, 2014

What is the best granularity of CouchDB databases?

After the granularity of documents that I covered in my previous post, the typical next question is about organizing the various document types. Is it better to store documents in different databases or group the documents in a single one?

CouchDB does not have a built-in notion of document type. As a comparison, you can organize documents by collections in MongoDB and you have basic functions to manage your collections (insert, remove, find, drop etc). There is nothing like it with CouchDB. A CouchDB database is able to contain any document of any shape or form. You then need to use views to access a subset of documents, and you can use any criteria. However, a commonly used pattern is to make sure all the documents have a "type" attribute. You can then use the type to create views to return a subset of the documents. For example, if you have documents for orders and line items, you can access them by type using this design document:

   {
     "language": "javascript",
      "views":
      {
        "orders": {
          "map": "function(doc) { if (doc.type == 'order')  emit(null, doc._rev) }"
        },
       "items": {
         "map": "function(doc) { if (doc.type == 'item')  emit(null, doc._rev) }"
       }
   }

When storing different document types in a single database, you also need to make sure that document ids cannot be in conflict. One approach that I recommend is to use the document type as a prefix of the document id. Deleting all or a subset of the documents of a given type can be achieved using the bulk API by sending the document ids, their revisions and set the _deleted attribute to true. However, it requires to get the the complete list of document ids and revisions before deleting, and the bulk update can fail due to conflicts.

On the other hand, you can create new databases and store different document types in each. The major drawback of this approach is that you cannot create views across databases. For example, if you want to list all orders and items of a given customer, using different databases for orders and items will be a problem. You may also want to setup replication later on, and you will need to create different replications and monitor more things. But if the data is really not related, using different databases is a nice option. Based on other posts CouchDB should handle a large number of databases even if some configuration may be necessary (see this post).

The question of database granularity can also come with the multi-tenancy requirement. Is it better to create a single database shared by tenants or to create a database per tenant. I will distinguish between two multi-tenancy use cases:

  • (a) lot of the infrastructure is shared between tenants and you need to monitor the activity of tenants globally. In this case, using a single database is better, and you need to make sure that all documents have a tenant attribute.
  • (b) each tenant needs to store data of its own and you do not need to aggregate data from different tenants in reports. In this case, using a database per tenant is better.

In conclusion, I recommend to use a single database to store various document types so that you can create views to manage your data with more flexibility. If your concern is about multi-tenancy, use a single database or a database per tenant depending on the need to aggregate data from different tenants.

Monday, September 22, 2014

What is the best granularity of CouchDB documents?

As many of us, I have a background in relational databases, but here is the time to understand new use cases and new technologies provided by NoSQL databases. There are many flavors of NoSQL databases and I am going to start a series of posts about CouchDB to share my experience on several key subjects. CouchDB is a JSON document database and probably one of the first questions anybody will ask is about the granularity of documents. Is it better to define coarse grain documents with many attributes and sub-objects or to define smaller ones ? In order to answer this question in the context of CouchDB, I will first define the three design forces that must be balanced: unity, size and concurrency.

The typical design process of a relational database is to start with a well defined entity relation model, then derive the normalized table representation and later on denormalize on a case by case basis for performance reasons. Using the relational model, you end up creating tables with a very fine grained representation and you do not distinguish between tables that would typically be accessed together and the ones that are more distant semantically. The application will have to join data from the tables to get data in a more meaningful way. This is where I see the main advantage of document database because you can keep semantically coupled data together in a single document. This is what I call unity. For example, an order having several line items could be stored like this:

   { _id : "order1",
     type : "order",
     customer : "c1",
     items: [
       {product : "p1", quantity : 1},
       {product : "p2", quantity : 5},
       {product : "p3", quantity : 2}
     ]
   }

The second force that you need to balance is the size of documents. Indeed, each time a document is changed, the whole document must be exchanged between the server and the client. There is no partial update as we can do with MongoDB. It is difficult to define a size limit, but if you have a lot of text for example, then decomposing into smaller document will be better.

Finally, you need to think about concurrency. CouchDB concurrency control is based on document revision, and there is no transaction. Each document has a revision and updating a document is just about creating a new document identified by the same id, but having a new revision. When the application needs to update a document, it has to provide the revision it wants to update. If the document has already been changed by the another part of the application, the revision has already been changed and the new update will be rejected with a conflict error. In this case, you may want to retry your updates after having potentially merged with the already updated document. However, having a highly concurrent application updating the same documents is quickly a big problem and will not work. You then have two major options: either you design you application so that you create new document at each update, or you decompose your document based on access patterns. In the latter option, you need to realize that concurrent updates may not always be about the same part of the document, and so decomposing the document into several small documents reduces the conflicts. In the example mentioned above, if line items are frequently changed by different parts of the application concurrently, creating a document for each line item may be necessary. Accessing line items of an order will then require a view that will fetch the items.

   { _id : "order1",
     type : "order",
     customer : "c1"
   }
   { _id : "order1.item1",
     type : "item",
     order: "order1",
     product : "p1", 
     quantity : 1
   }
   { _id : "order1.item2",
     type : "item",
     order: "order1",
     product : "p2", 
     quantity : 5
   }
   { _id : "order1.item3",
     type : "item",
     order: "order1",
     product : "p3", 
     quantity : 2
   }

As you can see, unity and concurrency are conflicting forces in the case of CouchDB. It is not necessarily the case with other document databases. MongoDB has the atomic operation of find-and-modify which helps a lot because you do not need to decompose the updates of a document in two steps: getting the revision, and sending the update. A question is why CouchDB does not provide such an atomic operation?

With this in mind, if the typical size of the data is reasonable, I would recommend to start with coarse grain documents, and decompose on a case by case basis based on concurrency needs. Otherwise, think about smaller documents right away.

Thursday, September 18, 2014

How to enforce secured connections with IBM Bluemix?

IBM Bluemix has a DataPower appliance in front of all deployed application (see Bluemix security). In particular, the DataPower terminates secured connections so that secured connections are forwarded to the application using non-secured connections. For example all the HTTPS traffic is forwarded as HTTP traffic to your applications.

This is very nice because you have nothing to configure in your app to accept secured connections. In addition, all the compute power needed to decrypt messages is on the DataPower, and not your application server instance.

However, it has one very important drawback. It does not enforce the use of secured connections. For example, a client application could use HTTP where HTTPS should have been used. This would be a major issue when using Basic Authentication, or anytime an Authorization header or access token is used.

Fortunately the DataPower sets some interesting headers to indicate several attributes of the original connection. One of them is the header $WSIS that indicates if the original connection was secured or not. With this in mind, we can easily write a servlet filter like this:


  @Override
  public void doFilter(ServletRequest request, ServletResponse response,
   FilterChain chain) throws IOException, ServletException {
    if (request instanceof HttpServletRequest && response instanceof HttpServletResponse) {
      HttpServletRequest req = (HttpServletRequest)request;
      HttpServletResponse res = (HttpServletResponse)response;
      String wsis = req.getHeader("$wsis");
      if (wsis!=null && !wsis.equalsIgnoreCase("true")){
        res.setStatus(403);
        return;
      }
    }
    chain.doFilter(request, response); 
  }
With this filter, 403 (FORBIDDEN) is returned if non-secured connections were used, so that secured connections are enforced when deployed to Bluemix. In addition, when you test your application locally you can still use the non-secured connections because the header $WSIS will not be present.

To conclude, I recommend to use this simple filter.