Presentation: The Power of Distributed Snapshots in Apache Flink

Track: Stream Processing In The Modern Age

Location: Bayview AB

Day of week:

Slides: Download Slides

Level: Intermediate

Persona: Architect, CTO/CIO/Leadership, Data Engineering, Data Scientist, Developer

Abstract

Come learn how Apache Flink is handles stateful stream processing and how to manage distributed stream processing and data driven applications efficiently with Flink's checkpoints and savepoints.

Over the last years, data stream processing has redefined how many of us build data pipelines. Apache Flink is one of the systems at the forefront of that development: With its versatile APIs (event-time streaming, Stream SQL, events/state) and powerful execution model, Flink has been part of re-defining what stream processing can do. By now, Apache Flink powers some of the largest data stream processing pipelines in open source data stream processing. Ranging from batch and streaming pipelines and analytics to microservices and applications, Flink has been used for a wide range of applications that can be unified under the paradigm of data stream processing. A key ingredient to that flexibility is Flink's handling of Streams and State. In the talk we will show how these are handled in Flink today: The types of state, why we picked distributed snapshots as the core consistency model, and how these checkpoints/savepoints form an increadibly powerful base to manage applications, including upgrades, rollbacks, reinstatements, migrations, forking, or blue/green deployments. Demo included.

Speaker: Stephan Ewen

Committer @ApacheFlink, CTO @dataArtisans

Stephan Ewen is a PMC member and one of the original creators of Apache Flink, and co-founder and CTO of data Artisans (data-artisans.com). He holds a Ph.D. from the Berlin University of Technology.

Find Stephan Ewen at