SESSION + Live Q&A

An Engineer's Guide to a Good Night's Sleep

As organisations look to empower engineers more, and embrace devops practices, we have seen the support role change quite a bit too. Developers are moving from being purely third line support, to working more collaboratively with engineers and operational staff. Also as we move to cloud native microservice solutions, the increased complexity and diversity of our production landscape means operational staff may well rely more heavily on the engineers, in particular out of hours.

 

I have spent the last 18 years working across a plethora of industries utilising a myriad of technology and approaches. From working on everything from trading applications to content enrichment APIs, I have seen a lot of approaches and processes try to help minimise operational support for developers.

 

In this talk, I will be exploring and discussing some of my top approaches and techniques to help reduce the risk of that dreaded 3am call! You will gain some practical insight into how to handle failure in today's more complex distributed microservice systems. This will include looking at approaches to resiliency, understanding your system, understanding the requirements for fault tolerance, and the developers' mindset necessary for this. I will be peppering this talk with real world examples, and an occasional war story along the way too.


Speaker

Nicky Wrightson

Ventures CTO @blenheimchalcot

Nicky has worked as an engineer for over 20 years over many industries. She is currently working as Ventures CTO for Blenheim Chalcot, a venture builder which believes in investing more than just funds but investing knowledge and experience, ideas and infrastructure to build new sustainable...

Read more
Find Nicky Wrightson at:

Location

Churchill, G flr.

Track

Architecting for Failure: Chaos, Complexity, and Resilience

Topics

Site Reliability EngineeringLondon

Share

From the same track

SESSION + Live Q&A Serverless

Building Resilient Serverless Systems

In this brave new world of serverless, we entrust our vendors with keeping the infrastructure up and running. However, when even cloud behemoths like Amazon Web Services and Google Cloud have outages and failures, how can we build resilient systems?   John Chapin explains how to use...

Johnathan Chapin

Cloud Technology Consultant with an expertise in Serverless Computing

SESSION + Live Q&A Interview Available

Learning From Chaos: Architecting for Resilience

In this talk Russ Miles, CEO of ChaosIQ, will share how leading organisations are successfully adopting chaos engineering to encourage a mindset of "architecting for resilience". Through chaos engineering, architects are able to establish a true "learning system" where everyone is involved in...

Russell Miles

CEO of @chaosiqio

SESSION + Live Q&A Chaos Engineering

How Condé Nast Succeeds by a Culture That Embraces Failure

Systems architectures are increasingly diverse to serve the growing demands for scalability, fault tolerance, isolation, and extensibility. But the compromise is ever complex software to operate and maintain often with no single shared view of entire design. This is especially true with the...

Crystal Hirschorn

VP Engineering, Global Strategy & Operations @CondeNast

SESSION + Live Q&A Site Reliability Engineering

Amplifying Sources of Resilience: What Research Says

Building robust software systems means anticipating how failures may occur with components and subsystems and developing answers to the question:    “What is needed for the design of systems that prevents or limits catastrophic failure?”   Investing in, developing, and...

John Allspaw

DevOps/Resilience Engineering Thought Leader, Previously CTO @Etsy & Co-founder of @AdaptiveCLabs

View full Schedule