SESSION + Live Q&A

Straggler Free Data Processing in Cloud Dataflow

One of the main causes of performance problems in distributed data processing systems (from the original MapReduce to modern Spark and Flink) is "stragglers." Stragglers are parts of the input that take an unexpectedly long time to process, delaying the completion of the whole job, and wasting resources that stay idle. Stragglers can happen due to imbalance of data distribution or processing complexity, hardware/networking anomalies, and a variety of other factors.

Google Cloud Dataflow is the first system to address the problem of stragglers in a fully general way. By dynamically redistributing parts of already launched work from straggler workers onto idle workers to maximize utilization, Google Cloud Dataflow is able to preserve data consistency and minimizing re-execution.

This talk describes the theory and practice behind Cloud Dataflow's approach to straggler elimination, as well as the associated non-obvious challenges, benefits, and implications of the technique.



Speaker

Eugene Kirpichov

Cloud Dataflow Sr SE @Google

Eugene is a Senior Software Engineer on the Cloud Dataflow team at Google, working primarily on the autoscaling and work rebalancing secret sauce as well as the Apache Beam programming model. He is also very interested in functional programming languages, data visualization (especially...

Read more
Find Eugene Kirpichov at:

Location

Fleming, 3rd flr.

Track

Modern Distributed Architectures

Topics

Cloud DataflowStream ProcessingSilicon ValleyInterview Available

Share

From the same track

SESSION + Live Q&A Event Driven Architecture

Spotify's Reliable Event Delivery System

Spotify’s event delivery system is one of the foundational pieces of Spotify’s data infrastructure. It has a key requirement to reliably deliver complete data with a predictable latency and make it available to Spotify developers via well-defined interface. Delivered data is than used to...

Igor Maravic

Software Engineer @Spotify

SESSION + Live Q&A Observability

Realtime & Personalized Notifications @Twitter

Twitter Notifications Infrastructure enables hundreds of millions of users to stay informed about what’s going on in their Twitter world. Our systems process large volumes of data (aka the Twitter firehose) and deliver realtime and personalized notifications to all kinds of users, ranging from...

Gary Lam

Tech Lead Notifications, Staff Software Engineer @ Twitter

Saurabh Pathak

Leads Notifications Team @Twitter

SESSION + Live Q&A Distributed Systems

Distributed Systems Theory for Practical Engineers

Distributed Systems are a complex topic. There's abundant research about it but sometimes it is hard for a beginner to know where to start. I would like to outline the main concepts of distributed systems, so the interested person can have a clear path on how to start their own research as well....

Alvaro Videla

Distributed Systems Engineer

SESSION + Live Q&A Open Space

Distributed Architectures Open Space

SESSION + Live Q&A NoSQL

Causal Consistency For Large Neo4j Clusters

In this talk we'll explore the new Causal clustering architecture for Neo4j. We'll see how Neo4j uses the Raft protocol for a robust underlay for intensive write operations, and how the asynchronous new scale-out mechanism provides enormous capacity for very demanding graph workloads. We'll...

Jim Webber

Chief Scientist @Neo4j

View full Schedule