Speaker: Yury Niño Roa
(She / her / hers)
Cloud Infrastructure Engineer @Google
Find Yury Niño Roa at:
Session + Live Q&A
Chaos Engineering Observability with Visual Metaphors
Observability is key in operating a system in production; it’s required during an incident, when an operator has to interrogate, inspect, and piece together what happened to avoid a similar event. In those scenarios, Chaos engineering and Observability are closely connected - providing concepts, practices, and disciplines that allow building reliability in the systems.
Considering that operators and engineers shape mental models while practising those disciplines, it’s critical to provide the proper metrics, dashboards, and visualisations. Both academia and the tech industry have focused a lot on improving metrics and dashboards. Metrics based on golden signals of monitoring and tools like well-established APM commercial solutions, and out-of-the-box products in the primary cloud providers are evidence of this. However, the visualisation of these metrics and the selection of appropriate visual metaphors in the dashboards have not evolved with the same acceleration. The histograms, line plots, and pie charts are still the only visual strategies available in the market.
This talk introduces a new actor: visual metaphors. We will talk about visualisation and how to use colours, textures, and shapes to create mental models that enrich the available options in observability and chaos engineering. I will present state of the art visualisation techniques, specifically: treemaps, heatmaps, visualisations based on a city, cosmic, geocentric, and sky metaphors. Finally, I will show the survey results after an operation team used these metaphors during on-call activities.
Session + Live Q&A
Could Observability-Driven Development Be the Next Leap?
Twenty years ago Kent Beck coined the term “test-driven development”: write tests first, develop the code later. Today, even if not practising true TDD, the idea of writing code without tests is an immediate warning sign to any developer. Yet, most teams still continue shipping code without adequate instrumentation to observe real system behaviour in production. Is it time we move to observability-driven development: “instrument first, develop later”?