SESSION + Live Q&A

Chaos Engineering: Why the World Needs More Resilient Systems

There are those of us that are motivated to build resilient systems, improve uptime, move fast and keep systems reliable. Then there are those of us who feel overwhelmed by our to-do lists and the features or projects we feel we need to get out the door.

The world needs more resilient systems because the world needs engineers in this for the long haul. We can create a better future for ourselves, those who come after us, our customers and our wider teams by focusing on building resilient systems. How do we make it easier for everyone to build resilient systems?

It is not easy to build resilient systems, but that doesn’t mean we shouldn’t try. Engineers love a technical challenge. In this talk I will explain how focusing on the detection, mitigation, resolution and prevention of incidents is a great place to start. I will share my experiences using chaos engineering to build resilient systems... even when you can’t build your systems from scratch.


Speaker

Tammy Butow

Principal Site Reliability Engineer @Gremlin

Tammy Butow is a Principal SRE at Gremlin where she works on Chaos Engineering, the facilitation of controlled experiments to identify systemic weaknesses. Gremlin helps engineers build resilient systems using their control plane and API. Tammy previously led SRE teams at Dropbox responsible for...

Read more

Location

Fleming, 3rd flr.

Track

Architecting for Failure

Topics

Chaos EngineeringIncident ManagementScaleResilient SystemsArchitecture

Share

From the same track

SESSION + Live Q&A Multi-cloud

Best Practices Building Resilient Systems

Architecting for Failure covers the challenges (both technical and organizational) of constantly improving service delivery of a growing global company with a 24x7x365 service redundancy requirement. The talk focuses on best practices and lessons learned in building resilient systems. Topics...

Pablo Jensen

CTO @Sportradar

SESSION + Live Q&A BlockChain

Architecting the Blockchain for Failure

In this talk I’ll be discussing some of the different approaches taken in the Ethereum blockchain for handling failure.   We’ll cover the public Ethereum blockchain and the steps taken to ensure a robust execution environment for this massively decentralised computer.   We’ll...

Conor Svensson

Founder of blk.io, author of web3j

SESSION + Live Q&A Event Driven Architecture

How Events Are Reshaping Modern Systems

Event-driven architecture and design have been getting a lot of attention in recent years. It’s an old concept that has been around for decades, so why this sudden peak of interest? In this talk, we will explore the nature of events, what it means to be event-driven, and how we can unleash the...

Jonas Bonér

Founder & CTO @Lightbend / Creator of Akka

SESSION + Live Q&A Machine Learning

Pragmatic Resiliency: Super 6 & Sky Bet Evolution

Sky Sports Super 6 is a free football results prediction game, launched in 2008. It’s extremely popular with over a million entries per week and drives a substantial proportion of our traffic at peak time, putting heavy load on our login/single sign on systems. This talk will focus on the...

Michael Maibaum

Chief Architect @SkyBet

View full Schedule