Think 2021: New tools have the developer ecosystem and IBM building together Learn more

Optimize a microservices workflow application architecture with the Camunda Workflow Engine

Introduction

This article explores a project I worked on that modernized a complex workflow application and migrated it onto the Red Hat OpenShift Container Platform. The main purpose of the article is to highlight how you can use the open source Camunda BPM Workflow Engine to optimize the architecture of a complex workflow application composed of multiple services. The proposed architecture can modernize a legacy centralized workflow application to make full use of a container platform like OpenShift and improve the overall nonfunctional requirements.

Problem description

In my migration project, the entire workflow consisted of more than 20 independent steps, and the business process of each step was complicated. To improve development efficiency, build teams with reasonable size, and avoid inefficient communication, we developed each step in the workflow as a microservice, and an orchestration service was responsible for the execution state control of the entire workflow and the orchestration of all microservices. This article simulates the architecture of that project with a simplified fictional workflow application and then discusses how to optimize its architecture with the Camunda BPM Workflow Engine.

The fictional payment workflow starts with 2 steps namedCheck Credit Card and Check Item Info that run in parallel to validate the card and item information. If all checks return as OK, the Charge Credit Card step undertakes the charge action with the card and item information. Figure 1 shows this process:

Figure 1. Sample payment workflow

Figure 1. Sample payment workflow

To implement this workflow application, a total of 4 services are required, listed in the following table:

Service Function
Payment Interface for kicking off the workflow and overall service orchestration and workflow execution state management
Check Credit Card Worker microservice: Check whether credit card is valid
Check Item Info Worker microservice: Check whether item information is correct
Charge Credit Card Worker microservice: Undertake charge operation to the credit card

To get a better understanding of the workflow, refer to the GitHub repository of the demo for this article.

You start with the original architecture that includes 4 services deployed on 2 servers. All services are developed as RESTful web services with JAX-RS:

  • The Payment service is deployed on Server A.
  • The remaining 3 microservices are deployed on Server B.
  • The Payment service calls others through a synchronous REST API.
  • Each service uses a domain object as an API parameter.

The entire workflow control logic is implemented in the Payment service, as shown in Figure 2:

Figure 2. Original architecture Figure 2. Original architecture

The external system starts the following workflow:

  1. External system calls the Payment service with credit card and item information.
  2. The Payment service calls Check Credit Card and Check Item Info concurrently to do data validation.
  3. When all responses are OK, the Payment service calls Charge Credit Card to complete the charge action and return the final result to the external system.

The external system starts the workflow shown in the sequence diagram in Figure 3:

Figure 3. Original sequence diagram

Figure 3. Original workflow sequence diagram

For this architecture, you could run into the following issues when migrating to a container platform like OpenShift:

  • The calling direction of the REST API is from the Payment service to other microservices, resulting in a strong dependence of Payment on the called service. This inevitably causes too many responsibilities to be concentrated in the Payment service. In addition to the workflow control logic itself, Payment must also consider the return status of the called microservice, the error handling, and retry to improve fault tolerance. The Payment service tends to take care of every possible status of the called microservice, which increases complexity and reduces maintainability. And if you carefully consider the responsibilities of Payment, the workflow control logic should be its core responsibility. Except for the calling of other microservices, error handling and retry should be isolated from the core logic to achieve true high cohesion.
  • Scalability is not ideal. The three microservices Check Credit Card, Check Item Info, and Charge Credit Card are deployed on the same server, and if one service suffers a high workload, it is hard to scale out for just that service.
  • Availability and resilience have inherent shortages. Because the synchronous REST API is used as the communication method between services, any error in any of the three worker microservices called by Payment causes the entire workflow to fail, and this error is returned to the client or external system that initiated the workflow, thereby affecting the user experience. To make matters worse, when users find an error returned, they are likely to continue to start new requests repeatedly, which increases the pressure on the server side and makes it much harder for the application to recover. The Check Credit Card, Check Item Info, and Charge Credit Card microservices are deployed on the same server and therefore considered to have the same availability. Because of this, there is less risk for low availability caused by the strong coupling between microservices in this architecture style.

Architecture optimization

For the issue of the strong dependency between services and the low cohesion of the Payment service, you can optimize this architecture by reversing the dependency between services. That is, the dependency of Payment on other worker microservices is changed to the dependency of worker microservice on Payment. Logically speaking, Payment as the implementation of workflow control logic is more suitable as an upstream service, and each worker microservice is more suitable as a downstream service waiting for the instructions of Payment. To put it more vividly, in the architecture, Payment directly initiates the execution of each worker microservice through the REST API. After reversing the dependencies, Payment only issues commands and sends them to the specified location, and each worker microservice polls to the specific location in Payment to determine whether it needs to initiate an execution.

You can improve the shortage of scalability by containerizing each service and deploying each service as an independent container. After containerization, each service can be independently scaled out according to its own load situation.

Regarding the issue of low availability, after reversing the dependency between services, the availability of Payment is no longer limited by the availability of other worker microservices, so even if one or all worker microservices are temporarily unable to provide service, the Payment service itself just works normally. It can continue to accept requests, but the execution of the entire workflow stops at the step where the worker microservice is not available. The user experience is improved by avoiding the direct exposure of failure information to the end user, which eventually eliminates the possible retry requests.

Figure 4. Optimized architecture

Figure 4. Optimized architecture

After reversing the dependencies between services, the original synchronous REST API is no longer suitable for use as the communication method, because each worker microservice cannot know in advance whether there is any task that needs to be handled by itself in Payment and thereby cannot determine when to call Payment. The message mechanism based on the asynchronous communication Publish-Subscribe event mode is more suitable for the new relationship between Payment and worker microservices. To realize this asynchronous message mechanism, usually you must introduce a message bus component such as RabbitMQ or Kafka, which increases the complexity of the system and increases the cost of system operation and maintenance after the migration. At the same time, because of the fundamental change of this communication mechanism, the workflow control logic of Payment will face fundamental changes, which is almost equivalent to rewriting and further increases the cost of migration. Is there a way to achieve the optimized architecture without increasing the cost of migration and operation?

Camunda BPM Workflow Engine

Camunda is a Java-based framework supporting BPMN for workflow and process automation, CMMN for case management, and DMN for business decision management. Camunda offers a wide range of components, and I highlight a few here, along with some typical user roles:

  • Camunda Modeler: Business analysts and developers create workflow definitions.
  • Camunda Tasklist/Custom Application: Users can inspect their workflow tasks and work on the tasks.
  • Camunda Cockpit: System operators monitor and operate on workflow tasks.
  • Camunda Admin: System administrators manage users, groups, and authorizations.

Figure 5 shows these components and user roles:

Figure 5. Camunda architecture overview

Figure 5. Camunda architecture overview

Camunda itself has many functions, but this article focuses on the external task feature. You can read about understanding and using external tasks.

External tasks perfectly fulfill the asynchronous communication requirements in the optimized architecture shown in Figure 4, but at the same time, they eliminate the need for deploying a specialized message bus component. The mystery is that an external task itself is implemented based on REST API, but it simulates the decoupling effect of Message Bus. The Long Polling feature is also supported to avoid polling constantly by the client and reduce the number of HTTP requests.

For the 4 RESTful web services implemented based on JAX-RS in the previous sample application, you can modify them to adopt external tasks.

For the Payment service:

  1. Implement the embedded process engine. Camunda Workflow is lightweight and can be started as part of the JAX-RS web service.

  2. Use the Camunda Modeler to make a workflow definition file, as shown in Figure 6. Because you need to implement the External Task pattern, you need to set the task in the definition as external with a unique topic name.

    Figure 6. Workflow definition in Camunda Modeler

    Figure 6. Workflow definition in Camunda Modeler

  3. Delete the original Java codes of workflow control logic and REST API calls to each worker microservice. Add the codes to start the Camunda Workflow engine process. This way, there are no more Java codes for the state management of the entire workflow. The Camunda Workflow Engine takes care of the workflow execution. Using the graphic tool of the Camunda Modeler also greatly improves the development efficiency for complex workflow processes.

For Worker microservices (Check Credit Card, Check Item Info, Charge Credit Card):

  1. Import the Camunda External Task Client, which supports Java and Node.js.

  2. Delete the original REST API interface, add the initial subscription process to the external task topic and handler method when a task occurs. Therefore, for worker microservices, only the interface needs to be modified and the original business processing logic does not need to be changed, hence the migration effort is small.

After completing this migration, you can deploy and run all applications as containers. You can even use Camunda web apps to monitor the running status of the workflow. For more details about the migrated implementation, refer to the GitHub repository of the demo.

Figure 7 shows the architecture implemented using the Camunda BPM Workflow Engine:

Figure 7. Optimized architecture with Camunda BPM Workflow Engine

Figure 7. Optimized architecture with Camunda BPM Workflow Engine

Summary

This article introduces a method to optimize the architecture of a workflow application composed of multiple microservices using the External Task pattern of the Camunda BPM Workflow Engine. The proposed new architecture can improve the cohesion, reduce the coupling between services, improve application availability and scalability, simplify the development, and make the entire application more suitable for the operation and maintenance on the container platform. You can use it as a reference architecture when designing a new workflow application or modernize a legacy workflow application targeting to run on a container platform like OpenShift or Kubernetes.