Chaos Engineering is an emerging discipline, but the underlying concepts are not. Failure is going to happen - Are you ready? Put simply, Chaos Engineering is one approach to “breaking things on purpose” that teaches us new information about our systems through experimentation. By triggering incidents intentionally in a controlled way, we gain confidence that our systems can deal with those failures before they occur in production. Come learn from those just starting this journey as well as the experts pushing the state of the art. We will hear war stories from those putting out the fires in the middle of the night, as well as those starting the fires during the day! In the end we’ll learn how to build systems and organizations that improve in the face of failure.
Track: The Art of Chaos Engineering
Location: Ballroom BC
Day of week:
Track Host: Kolton Andrus
Kolton is the founder of Gremlin - helping companies build more robust services. He was a Chaos Engineer at Netflix, focused on the resilience of the Edge services. He designed and built FIT: Netflix’s failure injection service. Prior he improved the performance and reliability of the Amazon Retail website. At both companies he has served as a ‘Call Leader’, managing the resolution of company-wide incidents. Kolton is passionate about building resilient systems, primarily as it lets him break things for fun and profit.
Chaos Architecture
Perfectly engineered resilient systems may be broken by confused operators when they behave differently in response to underlying failures. Highly available applications need to be resilient to failures in infrastructure, networks, applications and operators. Chaos engineering is needed to exercise the incident handling mechanisms at every level, including people and processes. This talk will look at best practices and challenges in getting to a chaos architecture mindset.
Chaos: The Last Stand Against Our Robot Overlords
As the complexity and criticality of our software systems is rapidly increasing; our ability and available methodologies to ensure their determinism and correctness are often nascent or sometimes even non-existent. We see the effects of this paradox as we advance the role and responsibility of software in society. Often the evidence is observed in service outages, security breaches, financial market "flash crashes", and now the ever shortening length of time between the development and eventual production of autonomous vehicles.
The pursuit of automating aspects of our lives is often stifled simply by chaos: i.e. our best laid plans coming in contact with the unexpected. An essential element of working with the chaos present in every system is to first be able to effectively characterize it. Chaos Engineering and chaos experiments on the complex data, interfaces, and algorithms used in autonomous vehicles should be a minimum requirement in validating operational safety. Taking it a step further, Chaos Engineering could be the beginning of bringing to Software Engineering the kind of determinism, predictability, and assurance we often take for granted everyday from disciplines like Structural, Mechanical, and Electrical Engineering. We need to begin to shift towards working with chaos instead of against it, in order to build safe, reliable, and increasingly deterministic complex systems. The change in how we engineer software for large-scale consumption is shifting from, "It might work, but I wouldn't bet my life on it." to, "I know this will work, I'd bet my life on it."
Failure at Netflix Velocity
Netflix is a strong believer in Chaos Engineering and the Velocity of Innovation. Most of the time, our customers never notice the former and appreciate the latter. Occasionally however…
Can not connect to Netflix. You press play and it doesn't work. You can't log in. Nothing is on the screen and Stranger Things Season 2 just released!
A behind the scenes look at how Netflix engineering teams think about failure. The tools, techniques, and training we use to shorten the inevitable failures of our systems and impacts to our customers. Come hear why we believe chaos is your friend, failure is guaranteed, and why our organization is better off having both.
Chaos Engineering on a Budget
As the systems that support internet-scale services grow larger and ever more complex, chaos engineering has emerged as industry best practice for ensuring system resiliency. Many companies maintain entire teams devoted to chaos testing their product. But what can you do if you don't have these kinds of resources to devote to the problem? How can you get started with chaos engineering without hiring an entire team of experts?
This is the story of implementing chaos testing on a small product, and how several small and targeted early investments in chaos engineering saved huge amounts of time and effort down the road.
The Art of Chaos Engineering Panel
Willie Wheeler, Principal Application Engineer @Expedia
Sahar Samiei, Senior Product Manager @Expedia
Nathan Äschbacher
Dave Hahn, Sr SRE, Reliability and Chaos Engineering @Netflix
Adrian Cockcroft, VP Cloud Architecture Strategy @AWSCloud & Microservices Pioneer
Heather Nakama, Software Engineer @Microsoft - Azure Search
Expedia’s Journey Toward Site Resiliency
Those coming from product-driven organizations—where product features are often prioritized over resiliency-related concerns—will understand how challenging it can be to convince teams to do resiliency work. In this presentation we’ll share Expedia’s resiliency journey, starting with resiliency as an afterthought and progressing toward resiliency as a first-class concern. Attendees will learn about the importance of partnering with the teams experiencing operational struggles, and equipping them with the data to make the right investments at the right time.
Sahar Samiei, Senior Product Manager @Expedia
Last Year's Tracks
Monday, 1 November
-
Microservices / Serverless Patterns & Practices
Evolving, observing, persisting, and building modern microservices
-
Practices of DevOps & Lean Thinking
Practical approaches using DevOps & Lean Thinking
-
JavaScript & Web Tech
Beyond JavaScript in the Browser. Exploring WebAssembly, Electron, & Modern Frameworks
-
Modern CS in the Real World
Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probabilistic programming
-
Modern Operating Systems
Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualization, including Linux on Windows, LinuxKit, and Unikernels
-
Optimizing You: Human Skills for Individuals
Better teams start with a better self. Learn practical skills for IC
-
Open Spaces
Tuesday, 2 November
-
Architectures You've Always Wondered About
Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, & more
-
21st Century Languages
Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
-
Emerging Trends in Data Engineering
Showcasing DataEng tech and highlighting the strengths of each in real-world applications.
-
Bare Knuckle Performance
Killing latency and getting the most out of your hardware
-
Socially Conscious Software
Building socially responsible software that protects users privacy & safety
-
Delivering on the Promise of Containers
Runtime containers, libraries, and services that power microservices
-
Open Spaces
Wednesday, 3 November
-
Applied AI & Machine Learning
Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, PyTorch, & more
-
Production Readiness: Building Resilient Systems
More than just building software, building deployable production ready software
-
Developer Experience: Level up your Engineering Effectiveness
Improving the end to end developer experience - design, dev, test, deploy, operate/understand.
-
Security: Lessons Attacking & Defending
Security from the defender's AND the attacker's point of view
-
Future of Human Computer Interaction
IoT, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
-
Enterprise Languages
Workhorse languages found in modern enterprises. Expect Java, .NET, & Node in this track