Keynote

Monkeys in Lab Coats: Applying failure testing research @Netflix

Industry and academia need each other. Far from the tire fires of production, university researchers have the time to ask big questions. Sometimes they get lucky and obtain answers that change how we think about large-scale systems! But detached from real world constraints, systems research in academia risks irrelevance: inventing and solving imaginary problems. Industry owns the data, the workloads and the know-how to realize large-scale infrastructures. They want answers to the big questions, but often fear the risks associated with research. Academics, for their part, seek real-world validation of their ideas, but are often unwilling to adapt their “beautiful” models to the gritty realities of production deployments. Collaborations between industry and academia -- despite their deep interdependence -- are rare.

In this talk, we present our experience: a fruitful industry/academic collaboration. We describe how a “big idea” -- lineage-driven fault injection -- evolved from a theoretical model into an automated failure testing system that leverages Netflix’s state-of-the-art fault injection and tracing infrastructures. This collaboration required us to take risks, to accept defeats, and to constantly evolve our approach to “make it work”. We sketch the architecture of the automated failure testing system we built and some of its discoveries, while providing intuition for why it works. Along the way, we will describe the challenges (expect as well as unexpected, technical as well as ideological) that arose, and how we overcame them.


Speaker

Kolton Andrus

Founder of Gremlin Inc, former Netflix

Kolton is the founder of Gremlin Inc - helping companies build more robust services. He was a Chaos Engineer at Netflix, focused on the resilience of the Edge services. He designed and built FIT: Netflix’s failure injection service. Prior he improved the performance and reliability of the...

Read more
Find Kolton Andrus at:

Speaker

Peter Alvaro

Computer Science Assistant Professor @UniversityofCalifornia

Peter Alvaro is an Assistant Professor of Computer Science at the University of California Santa Cruz. His research focuses on using data-centric languages and analysis techniques to build and reason about data-intensive distributed systems, in order to make them scalable, predictable and robust...

Read more
Find Peter Alvaro at:

Location

Fleming / Whittle, 3rd fl.

Topics

Designing for FailureResearchSilicon Valley

Share

Tracks

Discover some of the topics you will see at QCon London.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.