Keynote
Monkeys in Lab Coats: Applying failure testing research @Netflix
Industry and academia need each other. Far from the tire fires of production, university researchers have the time to ask big questions. Sometimes they get lucky and obtain answers that change how we think about large-scale systems! But detached from real world constraints, systems research in academia risks irrelevance: inventing and solving imaginary problems. Industry owns the data, the workloads and the know-how to realize large-scale infrastructures. They want answers to the big questions, but often fear the risks associated with research. Academics, for their part, seek real-world validation of their ideas, but are often unwilling to adapt their “beautiful” models to the gritty realities of production deployments. Collaborations between industry and academia -- despite their deep interdependence -- are rare.
In this talk, we present our experience: a fruitful industry/academic collaboration. We describe how a “big idea” -- lineage-driven fault injection -- evolved from a theoretical model into an automated failure testing system that leverages Netflix’s state-of-the-art fault injection and tracing infrastructures. This collaboration required us to take risks, to accept defeats, and to constantly evolve our approach to “make it work”. We sketch the architecture of the automated failure testing system we built and some of its discoveries, while providing intuition for why it works. Along the way, we will describe the challenges (expect as well as unexpected, technical as well as ideological) that arose, and how we overcame them.
Speaker

Kolton Andrus
Founder of Gremlin Inc, former Netflix
Kolton is the founder of Gremlin Inc - helping companies build more robust services. He was a Chaos Engineer at Netflix, focused on the resilience of the Edge services. He designed and built FIT: Netflix’s failure injection service. Prior he improved the performance and reliability of the...
Read moreFind Kolton Andrus at:
Speaker

Peter Alvaro
Computer Science Assistant Professor @UniversityofCalifornia
Peter Alvaro is an Assistant Professor of Computer Science at the University of California Santa Cruz. His research focuses on using data-centric languages and analysis techniques to build and reason about data-intensive distributed systems, in order to make them scalable, predictable and robust...
Read moreFind Peter Alvaro at:
Tracks
Discover some of the topics you will see at QCon London.
Apr 4
Apr 04
Architectures You've Always Wondered About

Eder Ignatowicz Java Champion, Tech Lead, and Principal Software Engineer @RedHat
Apr 04
Performance & Mechanical Sympathy

Martijn Verburg Principal Engineering Group Manager (Java) @Microsoft. ex CEO at jClarity (acquired by MSFT) & CxO / start-up mentor