MMS • RSS
Recently Ben Linders reported here on a talk Pierre Vincent, site reliability engineering (SRE) manager at Poppulo, presented about observability and distributed systems. We’ve had several articles over the years on the topic of observability and particularly in the area of serverless and microservices. There are also a number of open source projects in this space to assist developers, including Prometheus, OpenTracing, Envoy and most recently Kiali.
In a recent article, Zach Jory from Aspen Mesh, has written up further thoughts on the topic. He starts by observing something several other people have over the years, that whilst breaking monoliths down into microservices may be a necessary activity it does not lead automatically to easier systems and some activities can become more difficult:
An obvious area where it adds complexity is communications between services; visibility into service to service communications can be hard to achieve, but is critical to building an optimized and resilient architecture.
Zach states that whilst monitoring, which aims to give an idea of the overall health of a system, has been around for many years, the concept of observability, which aims to provide data on the behaviour of systems, has become increasingly important for cloud-based distributed systems such as those commonly found with microservices.
Observability is about data exposure and easy access to information which is critical when you need a way to see when communications fail, do not occur as expected or occur when they shouldn’t. The way services interact with each other at runtime needs to be monitored, managed and controlled. This begins with observability and the ability to understand the behavior of your microservice architecture.
Zach believes that the rise of service mesh technologies, such as Istio, has made observability a number one requirement for those looking to use or develop with them. (Although his company has their own service mesh implementation, the points he makes are implementation agnostic so worth understanding.) He then goes on to discuss the top two observability features that would be provided by a service mesh. He starts with Tracing, so you can know which microservices are involved in which transactions:
Distributed tracing is great for debugging and understanding your application’s behavior. The key to making sense of all the tracing data is being able to correlate spans from different microservices which are related to a single client request. To achieve this, all microservices in your application should propagate tracing headers.
Then there’s Metrics, based upon the telemetry data which can be gathered automatically across the service mesh. Important metrics to gather include request volume, request duration, latency and request size.
Most failures in the microservices space occur during the interactions between services, so a view into those transactions helps teams better manage architectures to avoid failures. Observability provided by a service mesh makes it much easier to see what is happening when your services interact with each other, making it easier to build a more efficient, resilient and secure microservice architecture.
Interestingly Zach touches on something we discussed at the end of 2017 as a prediction for 2018, that monitoring and observability would be a key factor for developers. At that time Péter Márton, CTO of RisingStack, had this to say:
To put microservices monitoring and observability to a next level and bring the era of the next APM tools, an open, vendor-neutral instrumentation standard would be needed like OpenTracing. This new standard needs to be applied by APM vendors, service providers, and open-source library maintainers as well.
Fast forward 8 months and we are definitely seeing the rise of projects concerned with making observability a key part of microservice architectures, whether or not based upon service mesh technologies. We are also seeing developer focussed efforts such as Eclipse MicroProfile add in support for OpenTracing.