Challenges and benefits of the microservice architectural style, Part 1
Challenges when implementing microservices
Microservices are an approach for developing and deploying services that make up an application. Microservices, as the name implies, suggest breaking down an application into smaller “micro” parts, by deploying and running them separately. The main benefit promised by this approach is a faster time-to-market for new functionalities due to better technology selection and smaller team sizes.
Experience shows, however, that microservices are not the silver bullet in application development and projects can still fail or take much longer than expected.
In the first part of this article, I want to point out the pitfalls and challenges associated with using microservices. In the second part, I will give my opinion and experience on how you can set up your microservices to avoid these pitfalls, and make your microservices project a success.
The original value proposition promised by using microservices can be seen in Werner Vogels’ presentation on the evolution of the Amazon architecture. Amazon was suffering from long delays in building and deploying code and “massive” databases that were difficult to manage and slowing down the progress for growth and new features.
The approach that Amazon took was to break up the application into (then-called) “Mini Services” that interacted with other services through well-defined interfaces. Small teams developed the microservices, where they could keep organizational efforts small. Technology selection would be up to the team, so the best technology for each job could be used. The team would also host and support their microservices. That, however, resulted in duplicate efforts for each team to solve the same problems (e.g., availability). This operational work slowed the development again, so in an effort to reduce this effort needed by each of the teams, Amazon looked toward the cloud business.
So, the main value proposition for Amazon to use microservices seemed to be:
- Time-to-market: Quickly get new features live
- Scale with increasing load requirements
According to Vogels, the mission from Jeff Bezos was: “Get big faster is way more important than anything else.”
So what do these microservices look like? In 2014, Martin Fowler defined the microservices architectural style, as it was already being used but not precisely defined in his mind.
““In short, the microservice architectural style [..] is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.” – Martin Fowler
Microservices are often seen as a counter-approach to the so-called “monolith.” A “monolith” is usually run in a single process, where the services implemented inside are not exposed – and thus not reusable – on the outside.
What are the real benefits of microservice architectures compared to monoliths? Looking at some of the lists on the web, there are such things as “every service can be in a different technology” – but is that a benefit? What is the business case behind this? Let me give you my understanding of the benefits that a microservices approach promises:
- Promise of agility, and faster time-to-market. Achieved, for example, because microservices are easier to understand, easier to enhance in smaller pieces, easier to deploy, and data management is decentralized and thus more agile.
- Promise of more innovation achieved by using the best technology for each problem, and by making the switch between technological bases easier. No long-term commitment to some specific technology is necessary.
- Promise of better resilience, such as better fault isolation, less impact on other services when an error happens, and is achieved through resiliency patterns like bulkheads, circuit breakers. All in all, better availability through graceful degradation.
- Promise of better scalability achieved through separate deployment of services, and autoscaling features of the infrastructure, paired with resiliency patterns to avoid overload.
- Promise of better reusability, achieved through an organization around business capabilities (instead of product or platform features), and interface patterns like “Tolerant reader” or Consumer-Driven Contracts.
- Promise of Improved Return on Investment (ROI), and better Total Cost of Ownership (TCO), achieved through faster (thus cheaper development), and the use of cheaper commodity hardware.
While these promises may be fulfilled by smaller projects, why is it that larger projects suffer from delays and cost overruns? In the next section, I will look at the challenges that are associated with the introduction of microservices. The second part of the article continues with best practices for microservices and ends with revisiting and validating the promises that we identified here.
In this section, I will look at the challenges that projects face when they implement their system in a microservice architectural style by using multiple microservices. A single microservice project may often work well. While there are also technical challenges, many of the challenges come from classical communication problems suddenly popping up when multiple microservices need to be coordinated.
Old-style application architecture…
…prevents scaling If you either migrate an existing application, or your development team is not experienced enough. You may be developing a microservice by using “classical style” patterns, or a monolith.
Such a microservice may rely on relational databases that are difficult to scale and/or complex to manage. Typically, relational databases guarantee ACID transactions (ACID), i.e. consistency especially across multiple instances in a cluster. This increases the overhead the database incurs on a query or modifying transaction. Keeping transaction state across multiple queries requires that the database holds conversational state – the transaction state – and that may incur operational issues, e.g. on a database restart or maintenance.
A 2-phase-commit, distributed transaction may put an even larger operational burden on the used resource and transaction managers and may prevent those from successful scaling: a transaction manager, for example, must keep the conversational state of the resource manager that has already prepared or committed a specific transaction. Such a transaction manager, as well as the resources participating in a 2-phase-commit transaction, cannot “just be restarted.” In the cloud, a restart usually means starting a new instance, losing the conversational state of the in-progress transactions and not freeing the resources locked in a transaction.
Another aspect of classical applications is if it relies on a session state or is using the Singleton pattern as synchronization between multiple parallel requests. In this case, it also hinders the horizontal scalability (i.e., scalability across multiple instances). Unfortunately, such a pattern is too easily implemented, and is as easy as a simple synchronized block (not all of them are problematic, but I would investigate them).
…is not resilient
Very often, classical applications rely on the availability of the underlying infrastructure. If infrastructure fails, the application breaks, where operations teams are notified and then need to fix the infrastructure. In some cases, resilience features are built in, like circuit breakers or request rate management, but that is not always the case. Two-phase-commit transactions, for example, rely on the availability of the resource and transaction managers. If they are not available, transactions and related resources may hang your system.
In a microservices approach with those many more deployment units and network connections compared to a monolith, there are many more “moving parts” that can break. And there are many parts that are changed or restarted independently when different microservices are updated by different teams, for example. An application just cannot rely on all the infrastructure to be available and needs to degrade gracefully, by implementing much more potential error handling code than in a monolith.
The microservice paradigm allows you to use different technological bases for each service. This, however, may result in having to use different tools for the same functionality – just because a different microservice is using a different technology.
For example, one team may decide to use Cassandra as a database, another one is using HBase on HDFS as database, and a third one is using MongoDB. Unfortunately, even though all of these are “NoSQL” databases, they are not to such an extent equivalent as are relational databases with their ACID guarantees. It may even be impossible to move from one database implementation type to another with the given requirements.
Also, one microservice may be written in one programming language, another microservice in a different one. There are few things you cannot do in most languages, which means the choice of a language is often based on personal preference and not on real technical requirement.
It is thus a skills problem that appears when a developer is switching teams, for example. More importantly, it becomes a problem when the application goes into maintenance mode, the teams are scaled down, and suddenly more people are needed to maintain the diverse tool and technology landscape.
So in the end, skills and runtime resources must be kept for all of them, which can result in a cost increase caused by technological diversity.
Complex team communications
A well-defined, monolithic application may not be as monolithic as the name implies. “Monoliths” can be properly modularized into domains, layers, and implemented as a kind of “set of microservices” that are just deployed in a single Java
The idea of a microservice is to be cut along business functionality. But in larger applications, you will inevitably have services that provide more basic functionality (e.g., database or logging), basic functional services (e.g., shopping cart or pricing), a choreographic layer (e.g., a service that choreographs cart and pricing to prepare an order), and maybe a backend-for-frontend (BFF) for different UI channels. So there will be dependencies between microservices.
Another dependency is introduced by using a shared data model. A data model is usually shared at least between a service consumer and a service provider, but often data objects are shared across multiple services, like master data. Consumer-driven contracts help decouple a service provider and consumer, to make it easier for a provider to understand what changes are compatible with a consumer. However, service design is still difficult, and sometimes more error prone than in classical environments where a “red cross” during compilation immediately tells you something went wrong.
What a “monolithic” deployment does do is hide this complexity in a single deployment unit. On the other hand, a microservice application exposes this complexity: each communication between services is “visible” on the microservices mesh infrastructure. What this results in can easily be seen if you just search the Internet for “microservices death star,” and you’ll see pictures of dependency graphs gone wild…
In a classical application development project, many of these problems can be mitigated when a single team – even if it is large – develops all the different parts of the application. At least there is a common schedule and an internal project management and (hopefully) lead architect to escalate and resolve potential conflicts. In a microservices application, where each microservice is developed relatively independently, possibly with different schedules and no or only weak overarching project or program management, a resource conflict between development teams could delay the implementation of your cross-microservice functionality.
For example, your change requires an adjustment in the user interface, BFF (backend-for-frontend), choreography, shopping cart, and pricing services. If each of the services is developed independently, the usual resolution is to implement a functionality bottom up – with one new layer per sprint, instead of one development sprint total. And this does not even count for changes identified late in the process that propagate back down and are not planned in those services that had already implemented the assumed finished change. So suddenly your microservices architecture forces you to do waterfall development – your application complexity has transformed into a communication and project management challenge.
Complex system communications
A microservice architecture breaks up an application into a number of independently deployed microservices that communicate with each other.
All the dependencies that used to be hidden in the monolith, possibly coded in dependency rules between components, now have to be coded in the infrastructure configuration. Infrastructure communication, however, is more difficult to maintain, there is no (at least, not yet) development environment that gives you red error marks when IP addresses or ports don’t match between producer and consumer.
It is important that the infrastructure configuration is provided “as code,” i.e., as a configuration file that can be checked, verified and maintained, and does not have to be created manually.
Another challenge is that what once used to be an in-memory call is now a call that moves between processes, potentially over the network, introducing additional latency and speed penalties. Calling a remote microservice in a loop (which is not good within a monolith process already) adds latency for every loop iteration and can quickly render your service call unusable.
This is only mitigated by careful service design that avoids loops around service calls. For example, by sending a collection of items in a single call and delegate the loop to the called service.
Testing and analysis
The size of a microservice is limited. One idea here is to reduce the complexity of each service, and be able to build and test each service independently from other services. For each service, this results in a more simple and straightforward development process.
What we found in larger projects that implemented multiple microservices, was that the integration and testing approach was often lacking, as the focus of the microservice teams was, of course, their own microservice. For example, logging would be done in a different format for different microservices. The different format was still making it difficult to get all the necessary information out of the log.
Complexity of operations
Once a microservice is built, it needs to be deployed in several testing environments, and later in the production environment. As each microservice team is empowered to decide on its own, which technology to use, how to deploy the service, and where or how to run it, each microservice will most likely be deployed and operated in a different way.
Add to the fact that deployment of services is, even though cloud makes many things easier, still a complicated matter if you take it seriously. You need to make sure to follow security rules (e.g., do not store your TLS keys in the code repository, protect your build infrastructure, and so on), ensure proper mesh features for resilience are used, and develop a way to separate out code from configuration to avoid things like hardcoded IP addresses!
If you are using infrastructure as a database, the setup is easily done in test and development. But do you know how to configure and deploy scalable database clusters, with proper backup and recovery strategies just in case? (Yes, even cloud storage can break, so you should at least know how long your storage provider takes to restore after a failure.)
In the end, a mixture of technologies selected in the development phase results in the need to keep a larger list of skills available for operations and maintenance. And be sure, after a microservice is in production, your development team will at some point not anymore like providing 24h/7d support.
Also note that the infrastructure cost for a cloud deployment may be higher for microservices compared to a more monolithic approach. That is caused by the base cost for each of the runtimes – the base JVM memory requirements are multiplied by the number of runtime instances for example.
If you are migrating an existing application from a monolithic approach to a microservice approach, you usually break out single domains from the application and deploy them into a new service.
However, decomposing a domain of course means that you introduce new network connections and the restrictions attached to it. Calling microservices from traditional applications can thus cause problems due to missing transaction management, especially the completely different consistency and error handling approaches.
You should also better plan to migrate from the “outside in,” by using the Strangler pattern. I would go further and try to keep new style microservices calling old-style applications, but better not vice versa: classical applications need adaptions to be able to handle the different communication protocols, consistency levels or error handling approaches.
Eng = System of Engagement, Recs = System of Records
Summary so far
In conclusion, using microservices promises the potential of faster time-to-market and better quality. For large projects using multiple microservices, it will considerably increase coordination and operations efforts and need to be carefully planned.
The complexity of an application does not go away. It is just transformed, and you must manage it either way. The main microservices challenges are summarized in the following list:
- Designing decoupled, non-transaction systems as opposed to old-style monoliths is difficult.
- Keeping data consistent and available while scaling above and beyond traditional databases.
- Many more “moving parts” with many more potential error cases force using graceful degradation approaches.
- The increased number of deployment units and their dependencies in the infrastructure leads to complex configurations that need to be maintained.
- Duplication of efforts across implementation teams and an increased cost through multiple different technologies.
- Integrated testing is difficult when there are only separate microservice teams.
- Greater operation complexity through more moving parts, and more operational skills are required from your development team.
Read part 2
In the second part, I will give you some best practices for developing microservices at scale and re-evaluate the promises of microservices against what we learned.