QCon San Francisco 2021 | Shriya Arora | Senior Software Engineer @Netflix

Speaker: Shriya Arora

Senior Software Engineer @Netflix

Shriya works on the Personalization Data engineering team at Netflix. The team is responsible for all the data products that are used for training and scoring of the various machine learning models that power the Netflix homepage. On the team, she been focusing on moving some of the core datasets from being processed in a once-a-day daily batch ETL to being processed in near-real time for both technical and business wins. Before Netflix, she was at Walmart Labs, where she helped build and architect the new generation item-setup, moving from batch processing to real-time item processing.

Streaming engines like Apache Flink are redefining ETL and data processing. Data can be extracted, transformed, filtered, and written out in real time with an ease matching that of batch processing. However, the real challenge of matching the prowess of batch ETL remains in doing joins, maintaining state, and dynamically pausing or resting the data.

At Netflix, micro-services serve and record many different kinds of user interactions with the product. Some of these live services generate millions of events per second, all carrying meaningful but often partial information. Things start to get exciting when the company wants to combine the events coming from one high-traffic micro-service to another. Joining these raw events generates rich datasets that are used to train the machine learning models that serve Netflix recommendations.

Historically, Netflix has done this joining of large volume datasets in batch. Recently, the company asked, If the data is being generated in real time, why can’t it be processed downstream in real time? Why wait a full day to get information from an event that was generated a few minutes ago?

This talks describes how we solved a complex join of two high-volume event streams at Netflix using Flink. You’ll learn about

Managing out of order events and processing late arriving data
Exploring keyed state for maintaining large state
Fault tolerance of a stateful application
Strategies for failure recovery
Schema evolution in a stateful realtime application
Data validation batch vs streaming

Speaker: Shriya Arora

SESSION + Live Q&A

Taming Large State: Lessons From Building Stream Processing

Location

Track

Topics

Share

Follow QCon

Contact

Menu

QCons around the World