By Dr. André Fachat | Updated February 14, 2019 - Published January 30, 2019
In Part 1 of this article, I have given an overview on challenges when implementing Microservices. In this part I will give some best practices for developing microservices at scale and re-evaluate the promises of Microservices against what we learned.
The new architectural style used for microservices removes capabilities that were used in traditional applications, most notably support for distributed transactions and shared session state. Also, the components within an application become more independent, so best practices previously more used between applications now need to be implemented between the microservices within an application.
Here is a list of practices that you should use in microservices-style applications:
Look out for other microservice best practices like the 12-factor-app, to avoid unwanted dependencies.
In general, try to look at modern tooling to see what fits your application best. kafkaorderentry is an (maybe overly extreme) example how to build an order management system based on the kafka event processing platform instead of classical (relational or even NoSQL) databases. It does not always have to be classical approach.
The “principle” (if not “dogma”) of microservices is to use the “best technology” for a given purpose. This in principle makes sense, but in those cases I found the criteria for the decision which would be the “best” technology did not take into account the efforts of operations and support. That resulted in a zoo of tools, with multiple different tools for the same purpose within one application. Also, it costs a lot of effort for every team to develop and manage their own build and deployment pipeline, only to unify and consolidate them later.
To overcome this effort, even a microservices project should consider the run and operations cost for each selected tool and runtime. To reduce this cost, experience from other projects, or at least a common denominator setup, should be re-used and only exceptions should be managed when necessary. A microservices development team, even if freed from the usual standards and regulations to provide quick business value, does not need to reinvent the wheel.
Microservices provide and consume stable, versioned APIs, so that each service can evolve independently as needed. A small focused team (the proposed measure is the “two pizzas” size of 5-7 persons) is the ideal development organization, responsible for the full stack: from database, over service layer, to service interface or even user interface. This proves to be a challenge, as the required skill set increases compared to a classical team-per-layer approach.
In some cases, a feature requires coordinated effort across multiple Microservices. A data model change may ripple through several services. An application may be just too large to be implemented in a single team.
Beware of Conway’s law! It states that the architecture resembles the organization structure of your development teams. It is one reason to argue for microservice development teams being responsible for the full stack, and in this case increases the speed of development and innovation, e.g., as seen on teamorg. However, when you need coordination and communication across microservices, this becomes more difficult.
In such cases, the approach often is to “go waterfall,” a change in one service in the first sprint, then the change in another service in its next sprint. However, this slows development down and leads to increased efforts in case there are feedback loops needed to change something that has been done in an earlier sprint (like adding a field that has been missed earlier).
Teams implementing different microservices must be able “to talk” to each other during a sprint, so a change can be implemented parallel in a coordinated way. If you cannot have short-lived “feature teams” across the microservices teams, at least nominate someone in each team to be responsible for the communication and coordination with the other teams for that specific feature. Ensure that project planning takes into account the extra communication and coordination effort for cross-microservice activities.
Also make sure that the code is integrated across teams and tested as often as possible, if not multiple times a day, to “fail fast” and to fix any problems that arise quickly.
Testing an application or microservice that runs 24h/7d that’s deployed regularly, maybe even multiple times a day, only works when tests are automated. Test automation and integration into the build and deployment pipeline is key here. Make sure you have a testing strategy from the beginning, to setup your test pipeline right from the start – changing tests just to fit an approach introduced too late is extra effort that can be avoided. (Check out “CI/CD in the enterprise is possible (and a good idea)” for tips on continuous testing.)
Test data generation should also be considered. This can either be synthetic data, or an anonymized set of production data (although GDPR and data privacy may restrict its use). The latter especially provides a larger and more realistic variability and thus better test data quality. Each problem that has been fixed should potentially become a new set of test data.
Non-functional testing is essential as well. Non-functional requirements are often considered as “implemented because we use <your favorite=”” scalability=”” pattern,=”” database=”” or=”” event=”” processing=”” tool=””>.” However, without a test to prove the system quality for the non-functional requirements, it should be considered as not implemented. Non-functional testing here not only means performance (response times), throughput (load), but also availability and resilience testing. While performance and throughput can be automated rather easily, resilience testing is more challenging. Netflix for example has automated their availability and resilience testing with their “Chaos Monkey” that randomly disables computers in their production(!) infrastructure to see how the other services react.
Even the best load and resilience testing may still overlook a situation that could potentially render the system unavailable or unresponsive. A good way to reduce the risk of such situations is to do Canary Releases, where they only gradually move the users from the old version of a service to a new version. Note that this could be done by routing requests to different instances of a service that have different versions.
To support the testing and error analysis (in test as well as in production), the system should be able to trace requests across microservices, such as using a correlation ID and opentracing.io and zipkin tools. An ELK-stack (Elasticsearch-Logstash-Kibana) or something similar should collect the logs and enable the team to search and filter logs without having to manually copy them from the servers. Prometheus could be used to monitor the infrastructure and applications, with Grafana to provide easy to use time-series dashboards of system parameters.
The development approach that’s used for microservices is usually agile, where changes can be developed and deployed quickly. The coordination of multiple microservices requires an organizational structure that ensures:
You should employ agile frameworks that are made to work at scale, be it “scrum of scrums,” or the “Scaled Agile Framework” (SAFe). For example, the role of the System and Solution architect is to enable the function for the development process overlooking the system as a whole.
Use a DevOps approach to quickly get feedback from production and end users. Support this approach by implementing monitoring tooling for the application. Here, synergy with testing can be achieved if the tools used in testing, like ELK, Prometheus and opentracing are used in production as well (also see the “Smarter Monitoring” approach).
A microservices approach is that “you build it, you run it.” But even if the microservices team is available on call in the beginning, prepare that at some point the maintenance will be taken over by another team. So make sure documentation and analysis tooling, like logging and monitoring support is “good enough” to enable outside developers understand and analyze the microservice. For example, ensure that architectural decisions and the rationale behind them is properly documented. Otherwise, such decisions become carved in stone, as after some time no one knows why they have been made the way they are, and no one is brave enough to challenge it. Or the other way around, mistakes from the past are repeated.
Now that we’ve described the promises of microservices, let’s have a look and evaluate these promises with what we have learned:
One main reason for delays and cost overruns in microservices projects probably is expectation management, where a service can be developed quickly, but it may need to evolve in several steps until it can fulfill the full production requirements. Friction between multiple microservices teams is not seen and coordination cost not recognized.
Microservices are a push to properly modularize your applications. If you’re already doing this in your monolith, the difference is the granularity of deployment. This granularity allows better scalability, reuse and resilience. On the other side a “monolith” can benefit from the infrastructure developed for microservices, like containerization, service mesh, autoscaling approaches, sidecars with circuit breakers or request rate limiting. The key indeed is proper modularization where even a monolith can be broken up later if some services for example need to be reused.
The microservices approach proposes that each team is responsible for its service. But if in your traditional approach the UI team does not “talk” to the service team, who does not talk to your infrastructure team, expect problems with one microservices team not communicating with another microservice team. In the end, a developer has two frames of responsibility: the vertical one to deliver business functionality and the horizontal one to deliver the necessary non-functional qualities. If you don’t take care of both of them in your organizational setup, Conway’s law will hit you either way. If you choose the monolith approach, define vertical sub-teams responsible to implement specific business functionality. If you choose the microservices approach, define horizontal teams responsible to ensure the non-functional qualities. (Hint: guilds are a start, but every team should participate, and the guilds should have authoritative power too.)
A large and complex application still stays large and complex, no matter what approach. The implementation teams need to talk to each other in a matrix kind of way, no matter what approach. Microservices are no silver bullet. In the end, no matter what approach, you have to THINK.
A great course to learn about application architecture using the Microservices approach on the cloud is the “IBM Cloud Garage Application Architect Bootcamp” course.
Many thanks go to my colleagues Christian Bongard, Wilhelm Burtz, Jim Laredo and Doug Davis, who have reviewed the article and commented on it.
In spite of the drawbacks, microservices continue to be popular with developers and enterprises as it greatly benefits their application…
Get the Code »
Microservices are definitely the hot new thing in commercial application development. The term microservice has replaced Agile, DevOps, and RESTful…
Back to top