SESSION + Live Q&A

Rethinking How the Industry Approaches Chaos Engineering:

In order to determine and envision how to achieve reliability and resilience that drive our businesses forward, organizations must be able to look back at past blunders unobscured by hindsight bias. Resilient organizations don’t take past successes as a reason for confidence. Instead, they use them as an opportunity to dig deeper, find underlying risks, and refine mental models of how our systems succeed and fail.  

There are key components of Chaos Engineering beyond building tools for experimenting in production and running game days. Understanding the concerns, ideas, and mental models of how the system is structured for each individual and learning where your organization excels in technical and human resilience are things that can’t be automated away by code. This talk will address the three different phases of Chaos Engineering and the hidden goals within each phase that might be the greatest benefit of all: using Chaos Engineering as a way to distill expertise.    

The chronically under-invested phases of Chaos Engineering in our industry are the Before and After phases -- and these tend to fall on a single individual to complete, usually a facilitator. This is someone who can act as a third party during the experiment, but prior to that will educate themselves on what the team is going through, their systems, and how it all works. If we only optimize for finding issues before they become incidents, we miss out on getting the most out of the point of Chaos Engineering, which is refining the mental models of our systems and distilling expertise.  

In this talk we focus on the Before and After phases of developing Chaos Engineering experiments (whether they be gamedays or driven by software) and develop important questions to ask with each of these phases. We will also dig into some of the Ironies of Automation present with Chaos Engineering today.


Speaker

Nora Jones

Senior Developer/ Engineer

Nora is a dedicated and driven technology leader and software engineer with a passion for people and reliable software, as well as the intersection between those two worlds. She truly believes that safety is pivotal with software development nowadays. She co-wrote two O’Reilly books on...

Read more
Find Nora Jones at:

Location

Whittle, 3rd flr.

Track

Chaos and Resilience: Architecting for Success

Topics

Incident ManagementResilient Systems

Share

From the same track

SESSION + Live Q&A Interview Available

Better Resilience Adoption through UX

Too often, attempts to bring resilience engineering to an organization fall flat. Perhaps there’s some initial interest, but that wavers under the crushing weight of JIRA queues and sprint reviews. The tools are there but no one’s using them.This session will go over three case...

Randall Koutnik

UI Engineer

SESSION + Live Q&A Interview Available

Preparing for the Unexpected

Convincing engineers to be on-call isn’t always straightforward. In 2019 the Customer Products group at the Financial Times set out to make their out of hours support process more sustainable after losing a number of people from their on-call team.In this talk you’ll discover how to...

Samuel Parkinson

Principal Engineer @FinancialTimes

SESSION + Live Q&A Incident Management

Growing Resilience: Serving Half a Billion Users Monthly at Condé Nast

Serving over half a billion monthly customers while keeping service availability high is a monumental task. Condé Nast operates in nearly 40 countries and is better known for it’s portfolio of household brands such as Vogue, Wired, Vanity Fair, The New Yorker. Our globally distributed...

Crystal Hirschorn

VP Engineering, Global Strategy & Operations @CondeNast

SESSION + Live Q&A Incident Management

How Many Is Too Much? Exploring Costs of Coordination During Outages

Service outages can attract a lot of attention from a wide range of participants - particularly when the service is for a business critical function. These ‘stakeholders’ represent multiple roles with different experience, responsibilities, expertise and knowledge about how the system...

Laura Maguire

Cognitive Systems Engineer & Researcher

View full Schedule