Chaos Engineering

Modern software-based services are implemented as distributed systems with complex behavior and failure modes. Chaos engineering uses experimentation to ensure system availability. Netflix engineers have developed principles of chaos engineering that describe how to design and run experiments.

Chaos Engineering, in InfoQ. Retrieved 2/24/2018. https://www.infoq.com/articles/chaos-engineering 

QCon - TAC

Presentations

How to Test Your Fault Isolation Boundaries in the Cloud

Will my system keep working when a server fails? When a data center goes offline? When a service dependency is unavailable?Availability calculations for redundant components require that those components are independent and autonomous of each other. But modern day systems are complex, exhibiting...

Jason Barto Principal Solutions Architect @AWS
Chaos Engineering Observability with Visual Metaphors

Observability is key in operating a system in production; it’s required during an incident, when an operator has to interrogate, inspect, and piece together what happened to avoid a similar event. In those scenarios, Chaos engineering and Observability are closely connected - providing...

Yury Niño Roa Cloud Infrastructure Engineer @Google
The Scientific Method for Testing System Resilience

Do you remember the Scientific Method from elementary school science class? It's time to dust off that knowledge and use it to your advantage to test your IT systems! In this session, you'll be re-introduced to the Scientific Method, and learn how Vanguard's software engineers and IT...

Christina Yakomin Senior Site Reliability Engineering Specialist @Vanguard_Group

Interviews

Yury Niño Roa Cloud Infrastructure Engineer @Google

Chaos Engineering Observability with Visual Metaphors

What is the focus of your work these days?

I am a Cloud Infrastructure Engineer at Google. Although I interact with partners, clients and sales teams, my work is very technical, my daily activities include implementing Infrastructure and AppDev solutions in GCP. Every day I am practicing and getting experience with DevOps, SRE, Application Development, Developer Operations,...

Read Full Interview
Christina Yakomin Senior Site Reliability Engineering Specialist @Vanguard_Group

The Scientific Method for Testing System Resilience

Christina, what is the focus of your work these days?

Right now, my primary focus is the staffing, onboarding and subsequent education of site reliability engineers for Vanguard. So I handle everything from what it means to be a site reliability engineer in the day-to-day, what tools and technologies they'll need to be familiar with and how to best get them up to speed. But also on...

Read Full Interview

Less than

0

weeks until QCon London 2022

Registration is £1990.00 (£0 off) for the 3-day conference if you register before Jan 1st
SAVE YOUR SEAT

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.