Chaos Engineering
Past Presentations
Failure at Netflix Velocity
Netflix is a strong believer in Chaos Engineering and the Velocity of Innovation. Most of the time, our customers never notice the former and appreciate the latter. Occasionally however… Can not connect to Netflix. You press play and it doesn't work. You can't log in. Nothing is on the screen...
Chaos Engineering on a Budget
As the systems that support internet-scale services grow larger and ever more complex, chaos engineering has emerged as industry best practice for ensuring system resiliency. Many companies maintain entire teams devoted to chaos testing their product. But what can you do if you don't have these...
Designing Services for Resilience Testing @Netflix
As an industry, we focus on designing microservices for availability. However, we don’t tend to speak about enabling these same services for resiliency testing. In a perfect world, you wouldn’t need resiliency testing, but that’s not the reality we are currently facing. This talk covers...
The Art of Chaos Engineering Panel
Expedia’s Journey Toward Site Resiliency
Those coming from product-driven organizations—where product features are often prioritized over resiliency-related concerns—will understand how challenging it can be to convince teams to do resiliency work. In this presentation we’ll share Expedia’s resiliency journey, starting with...
Whispers in the Chaos: Monitoring Weak Signals
The complexity of the socio-technical systems we engineer, operate, and exist within is staggering. Due to our daily interactions with and familiarity with our systems, the true gravity of this complexity can become easy to ignore. (And... let's face it, as a good coping strategy, too!) When...
Interviews
Failure at Netflix Velocity
What do you do day-to-day?
The majority of our time is focused on education, training, and follow up with other teams. We help them with instrumentation, metrics, actionable alerts, and best practices. The focus is really education and preparation.
Read Full Interview