© 2009-2023 Strange Loop | Privacy Policy

Subjecting systems to failures is supposed to increase confidence in their stability. But why? How do you form failure hypotheses? How do you reason about their safety? Why should your organization listen to you and invest in testing your failure hypotheses?
These are some of the questions I faced during my quest to improve production stability at work. In this talk, we will discuss three questions:
Subbu Allamaraju is a senior technologist at the Expedia Group, where he is leading a large-scale migration of Expedia Group's travel platforms from enterprise data centers to a highly available architecture in the cloud. Subbu is a well-rounded engineer and influencer with hands-on experience in software development, architecture, distributed systems, services, internet protocols, operations, and the cloud. Over the past several years, he has helped build and empower several engineering and operations teams in these areas.