Presentation: The Power of Distributed Snapshots in Apache Flink
Abstract
Come learn how Apache Flink is handles stateful stream processing and how to manage distributed stream processing and data driven applications efficiently with Flink's checkpoints and savepoints.
Over the last years, data stream processing has redefined how many of us build data pipelines. Apache Flink is one of the systems at the forefront of that development: With its versatile APIs (event-time streaming, Stream SQL, events/state) and powerful execution model, Flink has been part of re-defining what stream processing can do. By now, Apache Flink powers some of the largest data stream processing pipelines in open source data stream processing. Ranging from batch and streaming pipelines and analytics to microservices and applications, Flink has been used for a wide range of applications that can be unified under the paradigm of data stream processing. A key ingredient to that flexibility is Flink's handling of Streams and State. In the talk we will show how these are handled in Flink today: The types of state, why we picked distributed snapshots as the core consistency model, and how these checkpoints/savepoints form an increadibly powerful base to manage applications, including upgrades, rollbacks, reinstatements, migrations, forking, or blue/green deployments. Demo included.