Understanding Observability
Recently I gave a talk at my workplace about what is observability, how did we arrive at this term and, what all things are involved. Although we did record the talk, it had some discussions about relevant work for our clients, so I thought of at least sharing the slides if it can help others folks.
Since slides alone can’t give oneself the entire picture, I’ll put a short summary here and a link to the slides towards the end.
Slides 1-4
These are introductory slides. My motivation for the talk was to understand what is observability and see if I’m able to explain it to my colleagues. It also posts the entire agenda for the rest of the talk.
Slides 5-8
Here we try to understand our existing understanding of o11y and discuss why would we need another term when we are already aware of the monitoring, metrics, logs, traces, APM, etc.
Slides 9-18
Here we try to clear our ideas, understand the differences between monitoring and o11y and, know the need for o11y.
Slides 19-23
I wanted to clear out the understanding of metrics, logs, events, and traces since I’d be using it in the rest of the talk a lot.
Slides 24-31
In these slides, I wanted to post a picture of how did we arrive at the existing landscape. I feel this is needed to understand the similarities and overlap in a lot of the solutions out there. Because every other hosted solution out there claims as they are observability solution (looking at you APM folks!).
Slides 32-33
Although there are a lot of challenges in implementing all signals of observability I only talk about the ones related to tracing because in the case of Metrics and Logs, the majority of the tooling is matured and most of the efforts are spent at handling things at scale.
Slides 34-36
Once we are aware of the challenges part, we need to talk about the solutions. So I present a comparative view of the existing vendors who market their abilities as providing observability. This list is not an exhaustive representation of the market and up-to-date but the important point I wanted to convey was that how these solutions have evolved over the period same as the term itself. We also talk about some technical solutions to the challenges we talked about in previous slides.
Slides 37-41
We talk about the OpenTelemetry project which I’m very excited about. I also did a demo where I showcased collecting metrics and traces as well as the auto instrumentation over a toy example of microservices using docker-compose.
That’s all!
If this sounds interesting, click on the image below to go to the slides.
Also please feel free to reach out if you have any feedback or just want to share what you know!