Speaker: Rafael de Elvira Tellez
(He / him / his)
Senior Software Engineer @Slack
Outside work, Rafa enjoys spending time in the mountains climbing, hiking, mountain biking, etc with his friends but also spending time with his pets and cooking at home.
Find Rafael de Elvira Tellez at:
Session + Live Q&A
Slack’s DNSSEC Rollout: Third Time’s the Outage
We all have to manage DNS. DNS changes are inherently high-blast-radius and high-visibility.
We present a case study of what happened when a large SaaS company enabled DNSSEC. We did significant planning and testing beforehand. The rollout went smoothly for most of our domains, but one domain caused problems. We attempted three times to enable DNSSEC on this domain. Twice we rolled back after a partial rollout because of actual (or suspected) customer impact.
On the third occasion, we rolled out DNSSEC fully determined that the change had broken a small subset of our customers. While attempting to roll back… we made it worse. This talk will describe what happened.