Presentation: Crisis to Calm: Story of Data Validation at Netflix
This presentation is now available to view on InfoQ.com
Watch video with transcriptAbstract
The best outage is the one that never happens! Runtime system behavior is increasingly driven by data flowing from various data sources. Each update is as impactful as code pushes, if not more, increasing the risk of outages. This makes a strong case for automated detection of bad data, similar to what we already do for code pushes. To that end, we invested in detecting and preventing bad data in real time with techniques like circuit breakers and data canaries.
In this presentation, I will talk about the journey from having no data validations to our current set of techniques that are an essential part of availability at Netflix. I will share my experience in maintaining a great Netflix customer experience while enabling fast and safe data propagation.
Key takeaways:
- Detecting and preventing bad data is essential to high availability.
- Ways to make circuit breakers, data canaries and staggered rollout effective.
- Efficient validations via sharding data and isolating change.
Similar Talks
Evolution of Edge @Netflix
Engineering Leader @Netflix
Vasily Vlasov
Future of Data Engineering
Distinguished Engineer @WePay
Chris Riccomini
Observability in the Development Process: Not Just for Ops Anymore
Cofounder @honeycombio
Christine Yen
Data Mesh Paradigm Shift in Data Platform Architecture
Principal Technology Consultant @ThoughtWorks
Zhamak Dehghani
Scaling Patterns for Netflix's Edge
Playback Edge Engineering @Netflix
Justin Ryan
Architectures Panel
Playback Edge Engineering @Netflix
Justin Ryan
Secrets at Planet-Scale: Engineering the Internal Google KMS
Software Developer @Google
Anvita Pandit
Evolutionary Architecture as Product @ CircleCI
CTO @CircleCI
Robert Zuber
Architectures That Scale Deep - Regaining Control in Deep Systems
CEO and co-founder @LightStepHQ, Co-creator @OpenTracing API standard