Presentation: Crisis to Calm: Story of Data Validation at Netflix

Track: Microservices / Serverless Patterns & Practices

Location: Ballroom A

Duration: 4:10pm - 5:00pm

Day of week:

Slides: Download Slides

Level: Intermediate

Persona: Architect, Backend Developer

This presentation is now available to view on InfoQ.com

Watch video with transcript

Abstract

The best outage is the one that never happens! Runtime system behavior is increasingly driven by data flowing from various data sources. Each update is as impactful as code pushes, if not more, increasing the risk of outages. This makes a strong case for automated detection of bad data, similar to what we already do for code pushes. To that end, we invested in detecting and preventing bad data in real time with techniques like circuit breakers and data canaries.

In this presentation, I will talk about the journey from having no data validations to our current set of techniques that are an essential part of availability at Netflix. I will share my experience in maintaining a great Netflix customer experience while enabling fast and safe data propagation.

Key takeaways:

  • Detecting and preventing bad data is essential to high availability.
  • Ways to make circuit breakers, data canaries and staggered rollout effective.
  • Efficient validations via sharding data and isolating change.

Speaker: Lavanya Kanchanapalli

Senior Software Engineer @Netflix

Lavanya Kanchanapalli is a senior software engineer in Netflix’s Platform Engineering team designing and building core data infrastructure. She has worked on Netflix’s catalog infrastructure which aggregates and propagates metadata of all the movies and shows on the service. Her latest project is building of a scalable and cost-effective unified logging infrastructure for all the systems at Netflix. 

Find Lavanya Kanchanapalli at

Similar Talks