SESSION + Live Q&A
Amplifying Sources of Resilience: What Research Says
Building robust software systems means anticipating how failures may occur with components and subsystems and developing answers to the question:
“What is needed for the design of systems that prevents or limits catastrophic failure?” Investing in, developing, and sustaining the adaptive capacity to cope with unexpected situations is at the core of Resilience Engineering. In the software community, this means developing (continually!) ever-better answers to the question:
“When our preventative designs fail us, what are ways that teams of engineers successfully anticipate, resolve, and learn from those catastrophes?”
The Resilience Engineering community has been studying how people in high-consequence/high-tempo domains answer this latter question. Applying Resilience Engineering thinking and paradigms to the world of software engineering and operations is still in its infancy, but we have some promising routes for making progress. This talk will outline productive avenues to locate, amplify, support, and build this capacity that exists (sometimes invisibly) in the expertise of your organization. Spoiler: looking closely at the origins, handling, and perception of incidents is part of this story.
Speaker
John Allspaw
DevOps/Resilience Engineering Thought Leader, Previously CTO @Etsy & Co-founder of @AdaptiveCLabs
John Allspaw has worked in software systems engineering and operations for over twenty years in many different environments. John’s publications include the books The Art of Capacity Planning (2009) and Web Operations (2010) as well as the forward to “The DevOps Handbook.” His 2009...
Read moreFind John Allspaw at:
From the same track
Building Resilient Serverless Systems
In this brave new world of serverless, we entrust our vendors with keeping the infrastructure up and running. However, when even cloud behemoths like Amazon Web Services and Google Cloud have outages and failures, how can we build resilient systems? John Chapin explains how to use...
Johnathan Chapin
Cloud Technology Consultant with an expertise in Serverless Computing
An Engineer's Guide to a Good Night's Sleep
As organisations look to empower engineers more, and embrace devops practices, we have seen the support role change quite a bit too. Developers are moving from being purely third line support, to working more collaboratively with engineers and operational staff. Also as we move to cloud native...
Nicky Wrightson
Ventures CTO @blenheimchalcot
Learning From Chaos: Architecting for Resilience
In this talk Russ Miles, CEO of ChaosIQ, will share how leading organisations are successfully adopting chaos engineering to encourage a mindset of "architecting for resilience". Through chaos engineering, architects are able to establish a true "learning system" where everyone is involved in...
Russell Miles
CEO of @chaosiqio
How Condé Nast Succeeds by a Culture That Embraces Failure
Systems architectures are increasingly diverse to serve the growing demands for scalability, fault tolerance, isolation, and extensibility. But the compromise is ever complex software to operate and maintain often with no single shared view of entire design. This is especially true with the...
Crystal Hirschorn
VP Engineering, Global Strategy & Operations @CondeNast