Observability is an increasingly vital consideration for software engineers looking to build better, more stable applications. Here is everything you need to know about observability.
The term โobservabilityโ started to gain serious momentum in software engineering circles around 2018, as a natural evolution of monitoring practices. By bringing together the raw outputs of metrics, events, logs, and traces, software developers could start to gain a real-time picture of how their software systems are performing and where issues might be occurring.
The concept itself, however, has deep roots in the broader engineering principles of control theory, where the measure of the internal state of a system can be observed using only its external outputs.
Now, with the broad shift towards distributed software systems through microservices and containers, the old adage of not being able to manage what you canโt measure has never been more relevant.
Observability vs. monitoring
For many people, observability will just sound like a convenient rebranding of application monitoring, and any skepticism around the latest industry buzzword is justified. However, as my colleague David Linthicum puts it, there is a basic difference: Monitoring โis something you do (a verb); observability is an attribute of a system (a noun),โ he wrote.
Taking things one step further, engineering manager and technical blogger Ernest Mueller wrote back in 2018 that โobservability is a property of a system. You can monitor a system using various instrumentation, but if the system doesnโt externalize its state well enough that you can figure out whatโs actually going on in there, then youโre stuck.โ
As developers have broken up their applications into smaller chunksโcalled microservicesโhosted them in containers across distributed cloud servers, and deployed them continuously under the all-seeing eye of the devops team, the need for true observability has become increasingly critical.
โAs systems become more distributed, methods for building and operating them are rapidly evolvingโand that makes visibility into your services and infrastructure more important than ever,โ software developer Cindy Sridharan wrote in her book Distributed Systems Observability.
โObservability is a superset of monitoring,โ Sridharan wrote. โIt provides not only high-level overviews of the systemโs health but also highly granular insights into the implicit failure modes of the system. In addition, an observable system furnishes ample context about its inner workings, unlocking the ability to uncover deeper, systemic issues.โ
The three pillars of observability
There are three commonly agreed upon pillars of observability: metrics, traces, and logs.
Taken individually, these pillars represent a developerโs ability to instrument and monitor their systems. Once brought together and presented in as close to real time as possible, you can start to make those systems observable.
That being said, the three pillars do not miraculously add up to observability. โItโs not about logs, metrics, or traces, but about being data-driven during debugging and using the feedback to iterate on and improve the product,โ Sridharan wrote.
Greg Ouillon, the CTO for Europe, the Middle East, and Africa at monitoring vendor New Relic, sees observability as a confluence of the software engineering and monitoring trends that have shaped the cloud era.
โObservability addresses these challenges by rethinking monitoring and adapting to the new technology paradigm,โ Ouillon said. โBy providing you with a fully connected view of all software telemetry data in one place, real-time observability allows you to proactively master the performance of your digital architecture, accelerate innovation and software velocity, and reduce toil and operational costs.โ
Observability tools and vendor landscape
The vendor landscape is fairly complex when it comes to observability, as makers of logging, monitoring, and application performance management (APM) software all stake claims to offering observability tools. โObservability a year ago was a useful term, but now is becoming a buzzword,โ says Gartner analyst Josh Chessman.
Take log monitoring specialists like Splunk and Sumo Logic, both of which have moved further toward end-to-end observability by developing new features and making key acquisitions to round out their platforms. Splunkโs acquisitions include cloud network performance monitoring specialist Flowmill and user and application performance monitoring specialist Plumbr in 2020. Combined with the $1 billion purchase of real-time monitoring company SignalFx in 2019, it is clear that Splunk wants to be a one-stop-shop for observability tools.
Vendors like Dynatrace, Datadog, New Relic, SolarWinds, Scalyr (recently acquired by security specialist SentinelOne), and newcomer Honeycomb all also look to provide off-the-shelf instrumentation and observability as a service for engineering teams.
On the open source side, Grafana Labs has built a massively popular open source monitoring and observability platform. Apache Skywalker is another open source observability tool that allows system administrators to identify issues, receive key alerts, and monitor overall system health, with or without a service mesh.
The OpenTelemetry initiative is another open source project that has rapidly grown in popularity. The sandbox projectโwhich came about as a merger between OpenCensus and OpenTracingโsits with the Cloud Native Computing Foundation (CNCF) and has gathered broad support as an emerging industry standard for observability.
For developers looking to build their own observability stack from scratch, open source tools like Prometheus for metrics, Logstash for logs, and Jaegar for tracing can provide the building blocks required to get the three pillars of observability.
The next phase of observability
The Holy Grail for users and vendors in the observability spaceโwhether the toolkit is proprietary, open source, or even homegrownโis to automate away the fact-finding part of the process to the point where issues are automatically spotted and can be fixed before they affect users, or, better still, where the software fixes faults before the developers are even aware of the issue on their dashboard.
There is also a growing community of startups and open source projects looking at the next crop of observability challenges, such as the Signoz.io open source observability platform for Kubernetes and microservices, or Jeli, a project founded by an ex-Netflix engineer that focuses on giving developer teams the tools to map where their code is failing against the structure of their organization.
Building a culture of observability
Itโs important to remember that the three pillars alone do not instantly combine to achieve observability; people and process must also be aligned around a set of shared goals.
โThe process of knowing what information to expose and how to examine the evidence (observations) at handโto deduce likely answers behind a systemโs idiosyncrasies in productionโstill requires a good understanding of the system and domain, as well as a good sense of intuition,โ Cindy Sridharan wrote.
Observability should not be the goal in and of itself, but rather viewed as a means to build and operate more reliable software for customers. โThe value of the observability of a system primarily stems from the business and organizational value derived from it,โ Sridharan wrote. โBeing able to debug and diagnose production issues quickly not only makes for a great end-user experience, but also paves the way toward the humane and sustainable operability of a service, including the on-call experience.โ
Those dual incentives of better customer outcomes and a potentially easier life for software engineers should be enough to drive many organizations towards gaining better observability of their systems for years to come.


