DataEng
Past Presentations
Data Decisions With Realtime Stream Processing
At Facebook, we can move fast and iterate because of our ability to make data-driven decisions. Data from our stream processing systems provide real-time data analytics and insights; the system is also implemented into various Facebook products, which have to aggregate data from many sources. In...
Panel: SQL Over Streams, Ask the Experts
Queries over streams are generally "continuous," executing for long periods of time and returning incremental results. Yet operations over streams must have the ability to be monotonic. New Generation of Stream Processing Engines has added support for Stream SQL. This AMA / panel features a...
Experiences with Apache Beam
Apache Beam is an emerging programming API for streaming applications. This talk will discuss experience with Apache Beam from the "outside", including developing a runner for an existing streaming engine and how well Beam supports low latency streaming paradigms including complex analytics.
Fix Spark Failures and Bottlenecks Faster & Easier
This talk presents the results of analyzing many Spark jobs on many multi-tenant production clusters. Kirk discusses common issues seen, the symptoms of those issues, and how developers can address them. At Pepperdata, we have gathered trillions of performance data points on production clusters...
Patterns of Streaming Applications
Stream processing engines are becoming pivotal in analyzing data. They have evolved beyond a data transport and simple processing machinery, to one that's capable of complex processing. The necessary features and building blocks of these engines are well known. And most capable engines have a...
Human-Centric Machine Learning Infrastructure @Netflix
Netflix has over 100 data scientists applying machine learning to a wide range of business problems from title popularity predictions to quality of streaming optimizations. Our unique culture gives data scientists plenty of freedom to choose the modeling approach, libraries, and even the...
Interviews
Custom, Complex Windows @Scale Using Apache Flink
What's the focus of your work?
Recently, I’ve primarily been building data platforms. That is, platforms to enable Data and Software Engineers to collect and process data.
Read Full InterviewData Decisions With Realtime Stream Processing
QCon: What's the focus of your work and of the team that you're on at Facebook?
Rajesh: My team is working on stream processing, and we are part of the real-time data organization which focuses on faster, simpler, and smarter delivery of data. We want to reduce the time to results for people and our data driven products and people wait on that rely on data driven. Our organization encompasses the stream...
Read Full InterviewHuman-Centric Machine Learning Infrastructure @Netflix
Can you give an example of some of the questions you get from data scientists when you are trying to deploy models?
When it comes to common questions, as boring as it may sound, my experience is that machine learning infrastructure is much more about data than science. Most questions we get are related to data: how do I find the data I need, how do I set up the data pipeline, how do I handle the somewhat non-trivial amounts of data in python and R,...
Read Full Interview