Track: Emerging Trends in Data Engineering

Location: Bayview AB

Day of week:

Data Engineering is becoming increasingly relevant to our highly-connected, AI driven world. In the past, software engineers focused their efforts on developing scalable web architectures until they realized that their biggest headache was their data architecture. For most of us, data architecture simply meant running an RDBMS for all of our needs, from transactional read-write workloads to ad-hoc point and scan analytics loads. As our data grew, so did our use-cases for data-driven products (e.g. fraud detection systems, recommender systems, personalization services) -- these 2 rising trends combined to stress our RDBMS beyond their capabilities. Data engineers entered the field to solve our problems by introducing specialized data stores (e.g. search engines, graph engines, large scale data processing (e.g. Spark), NoSQL, stream processing (E.g. Beam, Flink, Spark)) and the machinery to glue them together (e.g. ETL pipelines, Kafka, Sqoop, Flume). Today, data architectures are as vast and varied as the use-cases they supports. What are some emerging technologies and trends in this space and how are some of cutting-edge companies solving their problems? Come to this track to learn more.

Track Host: Sid Anand

Hacker at Large, Co-chair @QCon & Data Council, PMC & Committer @ApacheAirflow

Sid Anand recently served as PayPal's Chief Data Engineer, focusing on ways to realize the value of data. Prior to joining PayPal, he held several positions including Agari's Data Architect, a Technical Lead in Search & Data Analytics @ LinkedIn, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. In his spare time, he is a maintainer/committer on Apache Airflow, a co-chair for QCon, and a frequent speaker at conferences. When not working, Sid enjoys spending time with family and friends.

10:35am - 11:25am

Data Engineering Open Space

11:50am - 12:40pm

Massively scaling MySQL using Vitess

Are you dealing with the challenges of rapid growth? Are you thinking about how to scale your database layer? Should you use NoSQL? Should you shard your relational database? If you are facing these kinds of problems, this session is for you. Vitess is a database solution for deploying, scaling and managing large clusters of MySQL instances. It's architected to run as effectively in a public or private cloud architecture as it does on dedicated hardware. It combines and extends many important MySQL features with the scalability of a NoSQL database. This session gives an overview of the salient features of Vitess, and at the end, we'll cover some advanced features with a demo.

Sugu Sougoumarane, Co-Founder / CTO @planetscaledata & Co-Creator @vitessio

1:40pm - 2:30pm

Transaction Processing in FoundationDB

FoundationDB provides users strongly consistent transactions without a two-phase commit protocol. This talk will go through the architecture of FoundationDB and describe what is happening in the internals of the database when a client commits a transaction.

Evan Tschannen, Lead Developer/Committer FoundationDB

2:55pm - 3:45pm

Patterns of Streaming Applications

Stream processing engines are becoming pivotal in analyzing data. They have evolved beyond a data transport and simple processing machinery, to one that's capable of complex processing. The necessary features and building blocks of these engines are well known. And most capable engines have a familiar Dataflow based programming model.

As with any new paradigm, building streaming applications requires a different mindset and approach. Hence there is a need for identifying and describing patterns and anti-patterns for building these applications. Currently this mindshare is scarce.

Drawn from my experience working with several engineers within and outside of Netflix, this talk will present the following:

A blueprint for streaming data architectures and a review of desirable features of a streaming engine
Streaming Application patterns and anti-patterns
Use cases and concrete examples using Flink

Attendees will come away with patterns that can be applied to any capable stream processing framework such as Apache Flink.

Monal Daxini, Distributed Systems Engineer / Leader @Netflix

4:10pm - 5:00pm

Training Deep Learning Models at Scale on Kubernetes

Deep Learning has recently become very important for all kinds of AI applications from conversational chatbots to self-driving cars. In this talk, we will talk about how we use deep learning for natural language processing, utilize Tensorflow for training deep learning models, run Tensorflow on top of Kubernetes, and use GPUs.

We have a need to train deep learning models for each conversational bot that we deploy on our platform. Training individual bots on one-off systems using ad-hoc processes is no longer a feasible solution as it does not scale with the number of bots in our system. In order to address the above requirements, we have built a framework for running long running jobs that leverages our existing Kubernetes infrastructure. We have designed our jobs framework to have the following key benefits.

Jobs can be executed either on a fixed schedule or a manual trigger or an automated trigger ( i.e some other event in our system can trigger a job)
High availability of job workers.
Scale up (or down) the number of workers for each job type based on need.
We can assign specific attributes to specific workers. For example, we ensure that our training workers are always executed on GPU nodes so that they can take full advantage of the GPU resources available in our infrastructure.
Simplified job management. This includes the ability to monitor, audit and debug each job that was executed. Further, using our systems for centralized logging and monitoring, we can quickly understand key results from the job. For example, in case of model training jobs, we can quickly look at the confusion matrix to understand if the trained model should be promoted to our production systems.

In the talk, we will present how we have leveraged Kubernetes to realize each of the above benefits.

Deepak Bobbarjung, Founding Engineer @PassageAI
Mitul Tiwari, CTO @PassageAI

5:25pm - 6:15pm

The Whys and Hows of Database Streaming

Batch-style ETL pipelines have been the de facto method for getting data from OLTP to OLAP database systems for a long time. At WePay, when we first built our data pipeline from MySQL to BigQuery, we adopted this tried-and-true approach. However, as our company scaled and our business needs grew, we observed a stronger demand for making data available for analytics in real-time. This led us to redesign our pipeline to a streaming-based approach using open-source technologies such as Debezium and Kafka.

This talk goes over the central design pattern around database streaming, change data capture (CDC), and what its advantages are over alternative approaches like trigger or event-sourcing. To solidify the concept, we will go through our MySQL-to-BigQuery streaming pipeline in detail, explaining the core components involved, and how we built this pipeline to be resilient to failure. Finally, we will expand on some of our on-going work around the additional challenges we face when streaming peer-to-peer distributed databases (i.e. Cassandra), and what some potential solutions around it are.

Joy Gao, Sr. Software Engineer @WePay

Last Year's Tracks

Monday, 1 November
Microservices / Serverless Patterns & Practices

Evolving, observing, persisting, and building modern microservices
Practices of DevOps & Lean Thinking

Practical approaches using DevOps & Lean Thinking
JavaScript & Web Tech

Beyond JavaScript in the Browser. Exploring WebAssembly, Electron, & Modern Frameworks
Modern CS in the Real World

Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probabilistic programming
Modern Operating Systems

Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualization, including Linux on Windows, LinuxKit, and Unikernels
Optimizing You: Human Skills for Individuals

Better teams start with a better self. Learn practical skills for IC
Open Spaces
Tuesday, 2 November
Architectures You've Always Wondered About

Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, & more
21st Century Languages

Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
Emerging Trends in Data Engineering

Showcasing DataEng tech and highlighting the strengths of each in real-world applications.
Bare Knuckle Performance

Killing latency and getting the most out of your hardware
Socially Conscious Software

Building socially responsible software that protects users privacy & safety
Delivering on the Promise of Containers

Runtime containers, libraries, and services that power microservices
Open Spaces
Wednesday, 3 November
Applied AI & Machine Learning

Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, PyTorch, & more
Production Readiness: Building Resilient Systems

More than just building software, building deployable production ready software
Developer Experience: Level up your Engineering Effectiveness

Improving the end to end developer experience - design, dev, test, deploy, operate/understand.
Security: Lessons Attacking & Defending

Security from the defender's AND the attacker's point of view
Future of Human Computer Interaction

IoT, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
Enterprise Languages

Workhorse languages found in modern enterprises. Expect Java, .NET, & Node in this track

Schedule

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Track: Emerging Trends in Data Engineering

Location: Bayview AB

Day of week:

Track Host: Sid Anand

Data Engineering Open Space

Massively scaling MySQL using Vitess

Transaction Processing in FoundationDB

Patterns of Streaming Applications

Training Deep Learning Models at Scale on Kubernetes

The Whys and Hows of Database Streaming

Last Year's Tracks

Monday, 1 November

Microservices / Serverless Patterns & Practices

Practices of DevOps & Lean Thinking

JavaScript & Web Tech

Modern CS in the Real World

Modern Operating Systems

Optimizing You: Human Skills for Individuals

Open Spaces

Tuesday, 2 November

Architectures You've Always Wondered About

21st Century Languages

Emerging Trends in Data Engineering

Bare Knuckle Performance

Socially Conscious Software

Delivering on the Promise of Containers

Open Spaces

Wednesday, 3 November

Applied AI & Machine Learning

Production Readiness: Building Resilient Systems

Developer Experience: Level up your Engineering Effectiveness

Security: Lessons Attacking & Defending

Future of Human Computer Interaction

Enterprise Languages

Follow QCon

Contact

Menu

QCons around the World