Keynote: ETL is dead; long-live streams

Location: Grand Ballroom ABC & Simulcast in Bayview AB

Day of week:

Abstract

What happens if you take everything that is happening in your company—every click, every database change, every application log—and make it all available as a real-time stream of well-structured data?

I will discuss the experience at LinkedIn and elsewhere moving from batch-oriented ETL to real-time streams using Apache Kafka. I’ll talk about how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data. I will cover some of the challenges of scaling Kafka to hundreds of billions of events per day at Linkedin, supporting thousands of engineers, applications, and data systems in a self-service fashion.

I will describe how real-time streams can become the source of ETL into Hadoop or a relational data warehouse, and how real-time data can supplement the role of batch-oriented analytics in Hadoop or a traditional data warehouse.

I will also describe how applications and stream processing systems such as Storm, Spark, or Samza can make use of these feeds for sophisticated real-time data processing as events occur.

Speaker: Neha Narkhede

Co-Creator Apache Kafka/Co-founder and CTO @Confluent

Neha Narkhede is co-founder and CTO at Confluent, a company backing the popular Apache Kafka messaging system. Prior to founding Confluent, Neha led streams infrastructure at LinkedIn, where she was responsible for LinkedIn’s streaming infrastructure built on top of Apache Kafka and Apache Samza. She is one of the initial authors of Apache Kafka and a committer and PMC member on the project.

Find Neha Narkhede at

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.