Track: Stream Processing In The Modern Age

Location: Bayview AB

Day of week:

Stream processing pipelines have become essential to building engaging experiences on the web today. Whether you enjoy personalized news feeds on LinkedIn and Facebook, profit from near real time updates to search engines and recommender systems, or benefit from near-realtime fraud detection on a lost or stolen credit card, you have come to rely on the fruits of stream processing as an end user. As a ops-focused engineer, you may employ stream processing to understand complex call trees in your microservice-based infrastructure with the aim to eliminate redundant system load or improve mobile and web application performance. As an business analyst, you may want to execute a SQL query on a live stream of data to reveal some insights. Come learn about interesting applications of stream processing as well as recent advances in the field.

Track Host: Tyler Akidau

Engineer @Google & Founder/Committer on Apache Beam

Tyler Akidau is a senior staff software engineer at Google Seattle. He leads technical infrastructure’s internal data processing teams in Seattle (MillWheel & Flume), is a founding member of the Apache Beam PMC, and has spent the last seven years working on massive-scale data processing systems. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer in batch and streaming as two sides of the same coin, with the real endgame for data processing systems the seamless merging between the two. He is the author of the 2015 Dataflow Model paper, the Streaming 101 and Streaming 102 articles, and the upcoming Streaming Systems book. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.

Custom, Complex Windows @Scale Using Apache Flink

100 Million members in over 190 countries leads to more than 1 Trillion events and 3 PB of data flowing through Netflix’s real-time data infrastructure each day. We’ve built a data pipeline in the cloud that reliably collects and routes these events to a variety of sinks. The data in these events are are used in several ways; from personalizing the customer experience to business intelligence.

The windowing capabilities offered by most stream processing engines are limited to aligned windows of a fixed duration. However, many real-world event processing use cases don’t fit this rigid structure, resulting in awkward processing pipelines. There haven’t been good alternatives, until recently that is. Apache Flink* offers a rich Window API that supports implementing unaligned windows of varying duration. In this talk, Matt Zimmer will discuss using this API to aggregate events into windows customized along varying definitions of a session. He will talk about implementation details such as:

Handling out-of-order events
Limiting state build-up while aggregating a subset of events from an event stream
Periodically emitting early results
Creating windows bounded by a type of event

Attendees will leave this talk with practical techniques and knowledge to implement their own custom windows in Apache Flink.

* Apache Flink (https://flink.apache.org/) is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.

Matt Zimmer, Real-time Data Infrastructure Senior Engineer @Netflix

Predictive Datacenter Analytics With Strymon

A modern enterprise datacenter is a complex, multi-layered system whose components often interact in unpredictable ways. Yet, to keep operational costs low and maximize efficiency, we would like to foresee the impact of changing workloads, updating configurations, modifying policies, or deploying new services.

In this talk, I will share our research group’s ongoing work on Strymon: a system for predicting datacenter behavior in hypothetical scenarios using queryable online simulation. Strymon leverages existing logging and monitoring pipelines of modern production datacenters to ingest cross-layer events in a streaming fashion and predict possible effects of such events in what-if scenarios. Predictions are made online by simulating the hypothetical datacenter state alongside the real one. Driven by a real-use case from our industrial partners, I will highlight the challenges we are facing in building Strymon to support a diverse set of data representations, input sources, query languages, and execution models.

Finally, I will share our initial design decisions and give an overview of Timely Dataflow; a high-performance distributed streaming engine and our platform of choice for Strymon’s core implementation.

Vasia Kalavri, PMC Member of Apache Flink (Core Developer Graph Processing API) & Postdoctoral researcher at the ETH Zurich Systems group

Streaming SQL Foundation: Why I ❤ Streams+Tables

What does it mean to execute robust streaming queries in SQL? What is the relationship of streaming queries to classic relational queries? Are streams and tables the same thing conceptually, or different? And how does all of this relate to the programmatic frameworks like we’re all familiar with? This talk will address all of those questions in two parts.

First, we’ll explore the relationship between the Beam Model (as described in The Dataflow Model paper and the Streaming 101 and Streaming 102 blog posts) and stream & table theory (as popularized by Martin Kleppmann and Jay Kreps, amongst others, but essentially originating out of the database world). It turns out that stream & table theory does an illuminating job of describing the low-level concepts that underlie the Beam Model.

Second, we’ll apply our clear understanding of that relationship towards explaining what is required to provide robust stream processing support in SQL. We’ll discuss concrete efforts that have been made in this area by the Apache Beam, Calcite, and Flink communities, compare to other offerings such as Apache Kafka’s KSQL and Apache Spark’s Structured streaming, and talk about new ideas yet to come.

In the end, you can expect to have a much better understanding of the key concepts underpinning data processing, regardless of whether that data processing batch or streaming, SQL or programmatic, as well as a concrete notion of what robust stream processing in SQL looks like.

Tyler Akidau, Engineer @Google & Founder/Committer on Apache Beam

The Power of Distributed Snapshots in Apache Flink

Come learn how Apache Flink is handles stateful stream processing and how to manage distributed stream processing and data driven applications efficiently with Flink's checkpoints and savepoints.

Over the last years, data stream processing has redefined how many of us build data pipelines. Apache Flink is one of the systems at the forefront of that development: With its versatile APIs (event-time streaming, Stream SQL, events/state) and powerful execution model, Flink has been part of re-defining what stream processing can do. By now, Apache Flink powers some of the largest data stream processing pipelines in open source data stream processing. Ranging from batch and streaming pipelines and analytics to microservices and applications, Flink has been used for a wide range of applications that can be unified under the paradigm of data stream processing. A key ingredient to that flexibility is Flink's handling of Streams and State. In the talk we will show how these are handled in Flink today: The types of state, why we picked distributed snapshots as the core consistency model, and how these checkpoints/savepoints form an increadibly powerful base to manage applications, including upgrades, rollbacks, reinstatements, migrations, forking, or blue/green deployments. Demo included.

Stephan Ewen, Committer @ApacheFlink, CTO @dataArtisans

Data Decisions With Realtime Stream Processing

At Facebook, we can move fast and iterate because of our ability to make data-driven decisions. Data from our stream processing systems provide real-time data analytics and insights; the system is also implemented into various Facebook products, which have to aggregate data from many sources. In this talk, we cover:

the difficulties of stream processing at scale
the solutions we've created to date
three case studies on improving the time to deliver insights with data via stream processing

Our case studies include examples from search product development, accelerating daily pipelines in the Data Warehouse, and seamless integration with our machine learning platforms. Each case study shows how we can deliver value to more teams while continuing to abstract the details of stream processing from various teams at Facebook. We conclude by speaking to the future of stream processing.

Serhat Yilmaz, Software Engineer @Facebook

Panel: SQL Over Streams, Ask the Experts

Queries over streams are generally "continuous," executing for long periods of time and returning incremental results. Yet operations over streams must have the ability to be monotonic. New Generation of Stream Processing Engines has added support for Stream SQL. This AMA / panel features a discussion with thought leaders evolving and shaping the space.

Julian Hyde, Original Developer @ApacheCalcite, Co-Founder SQLstream, & Architect @Hortonworks
Tyler Akidau, Engineer @Google & Founder/Committer on Apache Beam
Jay Kreps, Co-Founder and CEO @Confluent
Michael Armbrust, Initial Author of Apache Spark SQL & Leads Streaming Team @Databricks
Stephan Ewen, Committer @ApacheFlink, CTO @dataArtisans

Last Year's Tracks

Monday, 1 November
Microservices / Serverless Patterns & Practices

Evolving, observing, persisting, and building modern microservices
Practices of DevOps & Lean Thinking

Practical approaches using DevOps & Lean Thinking
JavaScript & Web Tech

Beyond JavaScript in the Browser. Exploring WebAssembly, Electron, & Modern Frameworks
Modern CS in the Real World

Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probabilistic programming
Modern Operating Systems

Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualization, including Linux on Windows, LinuxKit, and Unikernels
Optimizing You: Human Skills for Individuals

Better teams start with a better self. Learn practical skills for IC
Open Spaces
Tuesday, 2 November
Architectures You've Always Wondered About

Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, & more
21st Century Languages

Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
Emerging Trends in Data Engineering

Showcasing DataEng tech and highlighting the strengths of each in real-world applications.
Bare Knuckle Performance

Killing latency and getting the most out of your hardware
Socially Conscious Software

Building socially responsible software that protects users privacy & safety
Delivering on the Promise of Containers

Runtime containers, libraries, and services that power microservices
Open Spaces
Wednesday, 3 November
Applied AI & Machine Learning

Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, PyTorch, & more
Production Readiness: Building Resilient Systems

More than just building software, building deployable production ready software
Developer Experience: Level up your Engineering Effectiveness

Improving the end to end developer experience - design, dev, test, deploy, operate/understand.
Security: Lessons Attacking & Defending

Security from the defender's AND the attacker's point of view
Future of Human Computer Interaction

IoT, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
Enterprise Languages

Workhorse languages found in modern enterprises. Expect Java, .NET, & Node in this track

Schedule

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Track: Stream Processing In The Modern Age

Location: Bayview AB

Day of week:

Track Host: Tyler Akidau

Custom, Complex Windows @Scale Using Apache Flink

Predictive Datacenter Analytics With Strymon

Streaming SQL Foundation: Why I ❤ Streams+Tables

The Power of Distributed Snapshots in Apache Flink

Data Decisions With Realtime Stream Processing

Panel: SQL Over Streams, Ask the Experts

Last Year's Tracks

Monday, 1 November

Microservices / Serverless Patterns & Practices

Practices of DevOps & Lean Thinking

JavaScript & Web Tech

Modern CS in the Real World

Modern Operating Systems

Optimizing You: Human Skills for Individuals

Open Spaces

Tuesday, 2 November

Architectures You've Always Wondered About

21st Century Languages

Emerging Trends in Data Engineering

Bare Knuckle Performance

Socially Conscious Software

Delivering on the Promise of Containers

Open Spaces

Wednesday, 3 November

Applied AI & Machine Learning

Production Readiness: Building Resilient Systems

Developer Experience: Level up your Engineering Effectiveness

Security: Lessons Attacking & Defending

Future of Human Computer Interaction

Enterprise Languages

Follow QCon

Contact

Menu

QCons around the World