MMS • RSS
Article originally posted on InfoQ. Visit InfoQ
- The microservice architecture is still the most popular architectural style for distributed systems. But Kubernetes and the cloud native movement has redefined certain aspects of application design and development at scale.
- On a cloud native platform, observability of services is not enough. A more fundamental prerequisite is to make microservices automatable, by implementing health checks, reacting to signals, declaring resource consumption, etc.
- In the post-Kubernetes era, using libraries to implement operational networking concerns (such as Hystrix circuit breaking) has been completely overtaken by service mesh technology.
- Microservices must now be designed for “recovery”, by implementing idempotency from multiple dimensions.
- Modern developers must be fluent in a programming language to implement the business functionality, and equally fluent in cloud native technologies to address the non-functional infrastructure level requirements.
The microservices hype started with a bunch of extreme ideas about the organizational structure, team size, size of the service, rewriting and throwing services rather than fixing, avoiding unit tests, etc. In my experience, most of these ideas were proven wrong, not practical, or not generally applicable the least. Nowadays, most of the remaining principles and practices are so generic and loosely defined that they would stand true for many years to come without meaning much in practice.
Having been adopted a couple of years before Kubernetes was born, microservices is still the most popular architectural style for distributed systems. But Kubernetes and the cloud native movement has redefined certain aspects of application design and development at scale. In this article, I want to question some of its original microservices ideas and acknowledge the fact that they are not standing as strong in the post-Kubernetes era as they were before.
Not only observable, but also automatable services
Observability has been a fundamental principle of microservices from the very beginning. While it stands true for distributed systems in general, today (on Kubernetes specifically) a large portion of it is a given out-of-the-box at the platform level (such as process health checks, cpu and memory consumption). The very minimum requirement is for an application to log into the console in JSON format. From there on, the platform can track resource consumption, do request tracing, gather all kind of metrics, error rates, etc without much service level development effort.
On cloud native platforms, observability is not enough. A more fundamental prerequisite is to make microservices automatable, by implementing health checks, reacting to signals, declaring resource consumption, etc. It is possible to put almost any application in a container and run it. But to create a containerized application that can be automated and orchestrated effectively by a cloud-native platform requires following certain rules. Following these principles and patterns, will ensure that the resulting containers behave like a good cloud-native citizen in most container orchestration engines, allowing them to be scheduled, scaled, and monitored in an automated fashion.
Rather than observing what is happening in a service, we want the platform to detect anomalies and reconcile them as declared. Whether that is by stopping the directing of traffic to a service instance, restarting, scaling up and down, or moving a service to another healthy host, retrying a failing request, or something else, this doesn’t matter. If the service is automatable, all corrective actions occur automatically, and we only have to describe the desired state, rather than observing and reacting. A service should be observable, but also rectifiable by the platform without a human intervention.
Smart platform and smart services but with the right responsibilities
While transitioning from the SOA to the microservices world, the notion of “smart endpoints and dumb pipes” was another fundamental shift in the service interactions. In the microservices world, the services would not rely on the presence of a centralized smart routing layer, but instead, rely on the smart endpoints that possess some platform level features. That was achieved by embedding some of the capabilities of the traditional ESBs in every microservice and transitioning to lightweight protocols that don’t have business logic elements.
While this is still a popular way for implementing service interaction over a unreliable networking layer (with libraries such as Hystrix), now, in the post-Kubernetes era, it has been completely overtaken by service mesh technology. Interestingly, the service mesh is even smarter than what the traditional ESBs used to be. The mesh can do dynamic routing, service discovery, load balancing based on latency, response type, metrics and distributed tracing, retries, timeouts, you name it.
The difference to the ESB is that, rather than one centralized routing layer, with a service mesh, each microservice typically has its own router – a sidecar container that performs the proxying logic with an additional central management layer. And more importantly, the pipes (the platform and the service mesh) are not holding any business logic; they are purely focused on the infrastructure concerns, leaving the service to focus on the business logic. As shown on the diagram, this represents an evolution of the learnings from ESBs and microservices to fit the dynamic and non-reliable nature of cloud environments.
[Click on the image to enlarge it]
SOA vs MSA vs CNA
Looking at the other aspects of the services, we notice that cloud native affects not only about endpoints and service interactions. The Kubernetes platform (with all the additional technologies) also takes care of resource management, scheduling, deployment, configuration management, scaling, service interaction, etc. Rather than calling it “smart proxy and dumb endpoints” again, I think it is better described as a smart platform and smart services with the right responsibilities. It is not only about the endpoints; it is instead a complete platform, automating all the infrastructure aspects of services that primarily focus on business functionality.
Don’t design for failure, design for recovery
Microservices running on cloud native environments where the infrastructure and networking are inherently unreliable, have to be designed for failure. There is no question about it. But more and more failures are detected and handled by the platform, and there is less provision left for catching failures from within a microservice. Instead, think about designing your service for recovery by implementing idempotency from multiple dimensions.
The container technology, the container orchestrators, and the service mesh can detect and recover from many failures: infinite loops – CPU shares, memory leaks and OOM – health checks, disk hogs – quotas, fork bombs – process limits, bulkheading and process isolation – memory limits, latency and response based service discovery, retry, timeouts, auto scaling, etc. Not to mention, with the transition to the serverless model where a service lives only for a few milliseconds to handle one single request, concerns around garbage collection, thread pools, resource leakage are less and less relevant as well …
With all this and more handled by the platform, think about your service as a hermetic black box that will be started and stopped many times – make the service idempotent for restarts. Your service will be scaled up and down multiples times – make it safe for scaling by making it stateless. Assume many incoming requests will eventually time out – make the endpoints idempotent. Assume many outgoing requests will temporarily fail and the platform will retry them for you – make sure you consume idempotent services.
In order to be suitable for automation in cloud native environments a service must be:
- Idempotent for restarts (a service can be killed and started multiple times).
- Idempotent for scaling up/down (a service can be autoscaled to multiple instances).
- Idempotent service producer (other services may retry calls).
- Idempotent service consumer (the service or the mesh can retry outgoing calls).
If you service always behaves the same way when the above actions are performed one or multiples times, then the platform will be able recover your services from failures without human intervention.
Lastly, keep in mind that all the recovery provided by the platform are only local optimizations. As nicely put by Christian Posta, application safety and correctness in a distributed system is still the responsibility of the application. An overall business process-wide mindset (which may be spanning multiple services) is necessary for designing a holistically stable system.
Hybrid development responsibilities
More and more of the microservices principles are implemented and provided as capabilities by Kubernetes and its complementary projects. As a consequence, a developer has to be fluent in a programming language to implement the business functionality, and equally fluent in cloud native technologies to address the non-functional infrastructure level requirements while implementing a capability fully.
The line between business requirement and infrastructure (operational or cross functional requirements, or system quality attributes) is always blurry and it is not possible to take one aspect and expect somebody else to do the other. For example, if you implement the retry logic in the service mesh layer, you have to make the consumed service idempotent at the business logic or database layer within the service. If you use a timeout at service mesh level, you have to synchronize the service consumer timeouts within the service. If you have to implement a recurring execution of a service, you have to configure a Kubernetes job to do the temporal execution.
Going forward, some of the service capabilities will be implemented within the services as business logic, and others provided as platform capabilities. While using the right tool for the right task is a good separation of responsibilities, the proliferation of technologies increases the overall complexity hugely. Implementing even a simple service in terms of business logic requires a good understanding of distributed technology stacks, as the responsibilities are spread at every layer.
It is proven that Kubernetes can scale up to thousands of nodes, tens of thousands of pods and millions of transactions per second. But can it scale down? The threshold of your application size, complexity, or criticality that justifies the introduction of the “cloud native” complexity, is not clear to me yet.
It is interesting to see how the microservices movement gave so much momentum to the adoption of container technologies such as Docker and Kubernetes. While initially, it was the microservices practices driving these technologies forward, now it is Kubernetes defining the microservices architecture principles and practices.
As a recent example, we are not far from accepting the function model as a valid microservices primitive, rather than considering it as an anti-pattern for nano services. We are not questioning enough the cloud native technologies for their practicality and applicability for small and medium size cases, but instead are jumping in somewhat carelessly with excitement.
Kubernetes has many of the learnings from the ESBs and microservices, and as such, it is the ultimate distributed system platform. It is the technology defining the architectural styles rather than the other way around. Whether that is good or bad, only time will show.
About the Author
Bilgin Ibryam (@bibryam) is a principal architect at Red Hat, committer and member of ASF. He is an open source evangelist, blogger and the author of Camel Design Patterns and Kubernetes Patterns books. In his day-to-day job, Bilgin enjoys mentoring, coding and leading developers to be successful with building cloud-native solutions. His current work focuses on application integration, distributed systems, messaging, microservices, devops, and cloud-native challenges in general. You can find him on Twitter, Linkedin or his Blog.