Chaos Engineering
Past Presentations
Chaos Architecture
Perfectly engineered resilient systems may be broken by confused operators when they behave differently in response to underlying failures. Highly available applications need to be resilient to failures in infrastructure, networks, applications and operators. Chaos engineering is needed to...
Chaos: The Last Stand Against Our Robot Overlords
As the complexity and criticality of our software systems is rapidly increasing; our ability and available methodologies to ensure their determinism and correctness are often nascent or sometimes even non-existent. We see the effects of this paradox as we advance the role and responsibility of...
Failure at Netflix Velocity
Netflix is a strong believer in Chaos Engineering and the Velocity of Innovation. Most of the time, our customers never notice the former and appreciate the latter. Occasionally however… Can not connect to Netflix. You press play and it doesn't work. You can't log in. Nothing is on the screen...
Chaos Engineering on a Budget
As the systems that support internet-scale services grow larger and ever more complex, chaos engineering has emerged as industry best practice for ensuring system resiliency. Many companies maintain entire teams devoted to chaos testing their product. But what can you do if you don't have these...
Designing Services for Resilience Testing @Netflix
As an industry, we focus on designing microservices for availability. However, we don’t tend to speak about enabling these same services for resiliency testing. In a perfect world, you wouldn’t need resiliency testing, but that’s not the reality we are currently facing. This talk covers...
The Art of Chaos Engineering Panel
Interviews
Failure at Netflix Velocity
What do you do day-to-day?
The majority of our time is focused on education, training, and follow up with other teams. We help them with instrumentation, metrics, actionable alerts, and best practices. The focus is really education and preparation.
Read Full Interview