Session + Live Q&A

Modern Data Pipelines in AdTech—Life in the Trenches

There are various tasks that the modern data pipelines approach helps us solve in different domains, including advertising. Modern data pipelines allow us to process data in a more efficient manner with a diverse set of data transformation tools for both batch and streaming data processing. AdTech is a traditional industry that constantly changes and innovates. Today, it draws a lot of attention as we’re expanding the reach and movement toward a cookieless world.  

In this talk, you will learn how to use modern data pipelines for reporting and analytics, as well as the case of historical data reprocessing in AdTech. We’ll dive deeper into each case, exploring the problem itself, implementation, challenges, and future improvements. In cases like business rule changes or errors in past data, we need to re-process our historical data, and it’s not a trivial task as it requires a lot of time, precision and computational resources for each step. Due to this, a whole section of the talk will be devoted to approaches to historical data reprocessing and data lifecycle management.

Main Takeaways

1 Hear about using data pipelines in production, especially in advertising.

2 Learn how to deal with historical data processing and data lifecycle management.


What is the focus of your work these days?

I mostly work on Captify's data pipelines, specifically in advertising. I joined the company almost three years ago. One of the main things for me now is, except for maintenance, the creation of new pipelines is adoption of some new tech and modernizing the pipelines, which they already have, and changes in the data infrastructure. so a lot of innovation as well.

And what's the motivation behind this talk?

Give more production level examples of what's happening for the Big Data engineers when they work on a specific use case. I noticed at various events that a lot of talks are more on the side of frameworks and how they can be used, but sometimes they may not give very practical understanding how to apply that to the pipelines other than if you build it, it will work. That's why I wanted to give this information to all the people that work on various pipelines so they would gain knowledge of how pipelines fail in production, what kind of issues can happen in general, what kind of pipelines can be used for specific tasks within the domain.

And how would you describe the persona and level of your target audience for this session?

I would describe this persona as a middle senior level Big Data engineer or an architect who is going to modernize the pipelines or just work in the company trying to understand how to improve the current data infrastructure and ecosystem that they have. Probably, there are issues with existing pipelines and they are trying to understand how to solve those issues. Also, I'd like to share some tips on how to approach various tasks.

What would you like this persona to walk away with after your session?

I would like this persona to walk away with a better understanding on how to approach the issues that they already have in the company, or just a new knowledge about advertising domains and the challenges that are out there and also, an understanding of the importance of being knowledgeable about the product. Not all of the approaches to the problems are a golden standard, but there are many approaches to the same problem, and the ones that I present reflect the experience of many engineers within the company where I work. But all of those things can also be improved in the future, and there are other approaches. So a conference is a great place to share our experiences within the community as well. 


Speaker

Roksolana Diachuk

Big Data Engineer @Captify

Roksolana works as a Big Data Engineer at Captify. She is a speaker at technical conferences and meetups, one of the Women Who Code Kyiv leads. She is passionate about Big Data, Scala, and Kubernetes. Her hobbies include building technical topics around fairytales and discovering new cities.

Read more
Find Roksolana Diachuk at:

Date

Wednesday Apr 6 / 01:40PM BST (50 minutes)

Location

Whittle, 3rd flr.

Track

Modern Data Pipelines & DataMesh

Topics

Data Engineering

Add to Calendar

Add to calendar

Share

From the same track

Session + Live Q&A Data Engineering

Taming the Data Mess, How Not to Be Overwhelmed by the Data Landscape

Wednesday Apr 6 / 10:35AM BST

The data engineering field has evolved at a tremendous pace in the last decade, new systems that enable the processing of huge amounts of data generated enormous opportunities, as well as challenges for software practitioners. All these new tools and methodologies created a new set of...

Ismaël Mejía

Senior Cloud Advocate @Microsoft

Session + Live Q&A Data Engineering

Connecting Modern Data Pipelines and Data Products

Wednesday Apr 6 / 11:50AM BST

The complexity of tools, distributed systems, and the CAP theorem introduce tradeoffs that practitioners cannot avoid or ignore as they embrace the world of modern data pipelines. What strategies can you employ? This is where data products come into play. Understanding the business objectives of...

Dr. Einat Orr

Co-creator of @lakeFS, Co-founder & CEO of Treeverse

Roksolana Diachuk

Big Data Engineer @Captify

Ricardo Sueiras

Principal Advocate in Open Source @AWS

Ismaël Mejía

Senior Cloud Advocate @Microsoft

Session + Live Q&A Data Engineering

Orchestrating Hybrid Workflows with Apache Airflow

Wednesday Apr 6 / 02:55PM BST

According to analysts, 87 percent of enterprises have already adopted hybrid cloud strategies. Customers have many reasons why they need to support hybrid environments, from maximizing the value from heritage systems to meeting local compliance and data processing regulations. As they build...

Ricardo Sueiras

Principal Advocate in Open Source @AWS

Session + Live Q&A Data Engineering

Data Versioning at Scale: Chaos and Chaos Management

Wednesday Apr 6 / 04:10PM BST

Version control is fundamental when managing code, but what about data? Our data changes over time, first since it accumulates, we have new data points for new points in time. But this is not the only reason. We also have additional data added to past time, since we were able to get additional...

Dr. Einat Orr

Co-creator of @lakeFS, Co-founder & CEO of Treeverse

View full Schedule