Track: Modern Data Architectures

Location: Ballroom BC

Day of week:

Data architecture is a fast-moving field. Yesterday's best practices can turn out to be inadequate for today's problems.

We are looking to bring together data architects and engineers who have a deep understanding of the problems in the field today and a vision of what the future of data looks like. From modern solutions that proved useful at scale to timeless design principles that remain relevant.

We'll explore the ideas and systems you need today to build data architectures that will still be useful in the future.

Track Host: Gwen Shapira

Software Engineer @Confluent, PMC Member @Kafka, & Committer Apache Sqoop

Gwen is a principal data architect at Confluent helping customers to achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating microservices, relational and big data technologies. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects. When Gwen isn't coding or building data pipelines, you can find her pedaling on her bike exploring the roads and trails of California, and beyond.

10:35am - 11:25am

Future of Data Engineering

The current generation of data engineering has left us with data pipelines, data warehouses, and machine learning platforms that are largely batch-based and centrally managed. They're often largely manually operated, and integrating new systems can be cumbersome. Over the next few years, a number of trends are going to require us to rethink how and what we build. Data is now realtime, companies are running many database technologies, teams are demanding more control of their data, and regulatory policy has begun dictating how and when we store data. This talk will present a vision of what it will take for data engineers deliver a next generation data ecosystem.

Chris Riccomini, Distinguished Engineer @WePay

11:50am - 12:40pm

Data Mesh Paradigm Shift in Data Platform Architecture

Many enterprises are investing in their next generation data platform, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale.

In this talk Zhamak shares her observations on the failure modes of a centralized paradigm of a data lake, or its predecessor data warehouse.

She introduces Data Mesh, the next generation data platforms, that shifts to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.

Zhamak Dehghani, Principal Technology Consultant @ThoughtWorks

1:40pm - 2:30pm

Taming Large State: Lessons From Building Stream Processing

Streaming engines like Apache Flink are redefining ETL and data processing. Data can be extracted, transformed, filtered, and written out in real time with an ease matching that of batch processing. However, the real challenge of matching the prowess of batch ETL remains in doing joins, maintaining state, and dynamically pausing or resting the data.

At Netflix, micro-services serve and record many different kinds of user interactions with the product. Some of these live services generate millions of events per second, all carrying meaningful but often partial information. Things start to get exciting when the company wants to combine the events coming from one high-traffic micro-service to another. Joining these raw events generates rich datasets that are used to train the machine learning models that serve Netflix recommendations.

Historically, Netflix has done this joining of large volume datasets in batch. Recently, the company asked, If the data is being generated in real time, why can’t it be processed downstream in real time? Why wait a full day to get information from an event that was generated a few minutes ago?

This talks describes how we solved a complex join of two high-volume event streams at Netflix using Flink. You’ll learn about

Managing out of order events and processing late arriving data
Exploring keyed state for maintaining large state
Fault tolerance of a stateful application
Strategies for failure recovery
Schema evolution in a stateful realtime application
Data validation batch vs streaming

Sonali Sharma, Data Engineering and Analytics @Netflix
Shriya Arora, Senior Software Engineer @Netflix

2:55pm - 3:45pm

Modern Data Architectures Open Space

Session details to follow.

4:10pm - 5:00pm

Kafka Needs No Keeper

We have been served well by Zookeeper over the years, but it is time for Kafka to stand on its own. This is a talk on the ongoing effort to replace the use of Zookeeper in Kafka: why we want to do it and how it will work. We will discuss the limitations we have found and how Kafka benefits both in terms of stability and scalability by bringing consensus in house. This effort will not be completed over night, but we will discuss our progress, what work is remaining, and how contributors can help.

Colin McCabe, Software Engineer @confluentinc

5:25pm - 6:15pm

Practical Change Data Streaming Use Cases With Apache Kafka & Debezium

Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/) - Secret Sauce for Change Data Capture

Apache Kafka is a highly popular option for asynchronous event propagation between microservices. Things get challenging though when adding a service’s database to the picture: How can you avoid inconsistencies between Kafka and the database?

Enter change data capture (CDC) and Debezium. By capturing changes from the log files of the database, Debezium gives you both reliable and consistent inter-service messaging via Kafka and instant read-your-own-write semantics for services themselves.

In this session you’ll see how to leverage CDC for reliable microservices integration, e.g. using the outbox pattern, as well as many other CDC applications, such as maintaining audit logs, automatically keeping your full-text search index in sync, and driving streaming queries. We’ll also discuss practical matters, e.g. HA set-ups, best practices for running Debezium in production on and off Kubernetes, and the many use cases enabled by Kafka Connect's single message transformations.

Gunnar Morling, Open Source Software Engineer @RedHat

Last Year's Tracks

Monday, 1 November
Microservices / Serverless Patterns & Practices

Evolving, observing, persisting, and building modern microservices
Practices of DevOps & Lean Thinking

Practical approaches using DevOps & Lean Thinking
JavaScript & Web Tech

Beyond JavaScript in the Browser. Exploring WebAssembly, Electron, & Modern Frameworks
Modern CS in the Real World

Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probabilistic programming
Modern Operating Systems

Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualization, including Linux on Windows, LinuxKit, and Unikernels
Optimizing You: Human Skills for Individuals

Better teams start with a better self. Learn practical skills for IC
Open Spaces
Tuesday, 2 November
Architectures You've Always Wondered About

Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, & more
21st Century Languages

Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
Emerging Trends in Data Engineering

Showcasing DataEng tech and highlighting the strengths of each in real-world applications.
Bare Knuckle Performance

Killing latency and getting the most out of your hardware
Socially Conscious Software

Building socially responsible software that protects users privacy & safety
Delivering on the Promise of Containers

Runtime containers, libraries, and services that power microservices
Open Spaces
Wednesday, 3 November
Applied AI & Machine Learning

Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, PyTorch, & more
Production Readiness: Building Resilient Systems

More than just building software, building deployable production ready software
Developer Experience: Level up your Engineering Effectiveness

Improving the end to end developer experience - design, dev, test, deploy, operate/understand.
Security: Lessons Attacking & Defending

Security from the defender's AND the attacker's point of view
Future of Human Computer Interaction

IoT, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
Enterprise Languages

Workhorse languages found in modern enterprises. Expect Java, .NET, & Node in this track

Schedule

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Track: Modern Data Architectures

Location: Ballroom BC

Day of week:

Track Host: Gwen Shapira

Future of Data Engineering

Data Mesh Paradigm Shift in Data Platform Architecture

Taming Large State: Lessons From Building Stream Processing

Modern Data Architectures Open Space

Kafka Needs No Keeper

Practical Change Data Streaming Use Cases With Apache Kafka & Debezium

Last Year's Tracks

Monday, 1 November

Microservices / Serverless Patterns & Practices

Practices of DevOps & Lean Thinking

JavaScript & Web Tech

Modern CS in the Real World

Modern Operating Systems

Optimizing You: Human Skills for Individuals

Open Spaces

Tuesday, 2 November

Architectures You've Always Wondered About

21st Century Languages

Emerging Trends in Data Engineering

Bare Knuckle Performance

Socially Conscious Software

Delivering on the Promise of Containers

Open Spaces

Wednesday, 3 November

Applied AI & Machine Learning

Production Readiness: Building Resilient Systems

Developer Experience: Level up your Engineering Effectiveness

Security: Lessons Attacking & Defending

Future of Human Computer Interaction

Enterprise Languages

Follow QCon

Contact

Menu

QCons around the World