SESSION + Live Q&A
Lessons From a ~Yearly Re-Write of a Data Pipeline
Every year, we’ve set ourselves a goal of dramatically improving the performance and efficiency of our core data pipelines. We’ve done this by re-writing, effectively from scratch, the streaming pipelines that are responsible for processing over 120,000 events per second to deliver realtime personalisation to millions of web and mobile clients.
From our initial custom ETL system to the latest generation powered by Apache Beam, we’ve learnt to both respect and ignore the common wisdom of not re-writing software that works.
In this talk, we walk through multiple generations of our multi-tenant and high performance streaming data pipelines. We’ll compare the different approaches and frameworks, and highlight the lessons we’ve learnt from building perform data pipelines dealing with messy real-world data collection and aggregation.
Speaker
Jibran Saithi
Lead Architect @Qubit
Jibran is Lead Architect at Qubit. He has an unhealthy interest in plumbing data pipelines.
Read moreFind Jibran Saithi at:
From the same track
Next Steps in Stateful Streaming with Apache Flink
Come learn how Apache Flink is making stateful stream processing even more expressive and flexible to support applications in streaming that were previously not considered streamable. Over the last years, data stream processing has redefined how many of us build data pipelines. Apache Flink is...
Stephan Ewen
Committer @ApacheFlink, CTO @dataArtisans
Drivetribe: A Social Network on Streams
Drivetribe is the world's biggest motoring destination, as envisioned by Jeremy Clarkson, Richard Hammond, and James May. Built on top of the Event Sourcing/CQRS pattern, the Drivetribe platform uses Apache Kafka as its source of truth and Apache Flink as its processing backbone. This talk aims...
Aris Koliopoulos
CTO @Drivetribe
Hamish Dickson
Backend engineer @Drivetribe
Streaming SQL Foundations: Why I ❤ Streams+Tables
What does it mean to execute robust streaming queries in SQL? What is the relationship of streaming queries to classic relational queries? Are streams and tables the same thing conceptually, or different? And how does all of this relate to the programmatic frameworks like we’re all familiar...
Tyler Akidau
Engineer @Google & Founder/Committer on Apache Beam
Streaming Reactive Systems & Data Pipes w. squbs
Reactive libraries are nothing new to the JVM. Reactive Streams as an SPI has even made its way into Java 9. However, their uses within microservice components are still for relatively narrow purposes like service orchestration. But we think differently. Our whole presence and universe can be...
Akara Sucharitakul
Principal MTS, Architect @PayPal
Anil Gursel
Software Engineer @PayPal