SESSION + Live Q&A
Avoiding Alerts Overload From Microservices
Microservices can be a great way to work: the services are simple, you can use the right technology for the job, and deployments become smaller and less risky. Unfortunately, other things become more complex. You probably took some time to design a deployment pipeline and set up self-service provisioning, for example. But did the rest of your thinking about what “done” means catch up? Are you still setting up alerts, run books, and monitoring for each microservice as though it was a monolith?
Two years ago, a team at the FT started out building a microservices-based system from scratch. Their initial naive approach to monitoring meant that an underlying network issue could mean 20 people each receiving 10,000 alert emails overnight. With that volume, you can’t pick out the important stuff. In fact, your inbox is unusable unless you have everything filtered away where you’ll never see it. Furthermore, you have information radiators all over the place, but there’s always something flashing or the wrong colour. You can spend the whole day moving from one attention-grabbing screen to another.
That team now has over 150 microservices in production. So how they get themselves out of that mess and regain control of their inboxes and their time? First, you have to work out what’s important, and then you have to ruthlessly narrow down on that. You need to be able to see only the things you need to take action on in a way that tells you exactly what you need to do. Sarah shares how her team regained control and offers some tips and tricks.
Speaker
Sarah Wells
Former Tech Director for Engineering Enablement @FT (Financial Times)
Sarah is a technology leader, consultant and conference speaker with a focus on microservices, engineering enablement, observability and devops. She has over 20 years experience as a developer, principal engineer and tech director across product, platform, SRE and devops teams.Sarah spent...
Read moreFind Sarah Wells at:
From the same track
Do You Really Know Your Response Times?
With the recent surge in highly available microsevervices with high incoming traffic, it is becoming more and more important to know how your service is performing right now and to be able to diagnose issues in production quickly. It took a while for us to understand how to produce meaningful...
Daniel Rolls
Collecting and Interpreting Large-Scale Data Collected @SkyUK
Monitoring Serverless Architectures
Serverless architectures are attracting more and more interest from the IT professionals and companies hoping to lower the costs of creating and operating distributed systems without constant worrying about availability, scalability and capacity management. Despite all the attractive properties...
Rafal Gancarz
Lead Consultant @OpenCredo
After Acceptance: Reasoning About System Outputs
Modern software development allows us to prove that new work is functionally complete. We write a set of executable specifications. We automatically execute them in the form of acceptance tests as part of our continuous delivery pipeline. When all the tests pass, we are done! This approach is...
Dr. Stefanos Zachariadis
Senior Software Engineer
Observability, Event Sourcing and State Machines
What is a way to have complete transparency of the state of a service? Ideally we would record everything - the inputs, outputs and timings - in order to capture highly reproducible and transparent state changes. However, is it possible to record every event or message in and out of a service...
Peter Lawrey
CEO @Chronicle_SW