Session + Live Q&A
Orchestrating Hybrid Workflows with Apache Airflow
According to analysts, 87 percent of enterprises have already adopted hybrid cloud strategies. Customers have many reasons why they need to support hybrid environments, from maximizing the value from heritage systems to meeting local compliance and data processing regulations. As they build their data pipelines, they increasingly need to be able to orchestrate those across on-premises and cloud environments. In this session, I will share how you can leverage Apache Airflow to orchestrate a workflow using data sources inside and outside the cloud.
Main Takeaways
1 Hear about Apache Airflow, what it is and how it helps with orchestrating data pipelines.
2 Learn what are some of the tradeoffs of using Apache Airflow, and what are some new capabilities that teams can use.
What is the focus of your work these days?
I am a principal developer advocate for open source, which means that I work with builders to make AWS the best place to run open source workloads. Sometimes that is about helping to spread awareness of open source projects. Sometimes it's about helping open source projects figure out how to best work on AWS. This means I do demos, I write blogs and I speak at events like QCon.
And what is the motivation for your talk?
Over the last 12 months, I've been spending a lot of time on a very popular open source project called Apache Airflow, which is used by data engineering teams to help automate their data pipelines. And I also work with customers who are moving to the cloud, and they're increasingly looking at solutions that work both within the cloud but also outside of the cloud. So how do they work in hybrid environments? The motivation of this talk is how can you take an open source project such as Apache Airflow, which is a workflow orchestrator, and extend that so that they can work and orchestrate workflows anywhere.
You touched on this a little bit with the last question, but how would you describe the persona and level of the target audience for your session?
The personas that I typically go for are data engineer, and sysadmins. These are the folks that typically have to understand what solutions and tools are available to help them solve problems, understand the capabilities and how to use them, architect the solution, and ultimately do the wiring and deployment of all this stuff. It's those two broad buckets of personas who actually I talk to within my demo.
What would you like this persona to walk away with at the end of your presentation?
The key thing here is to understand how, first of all, Apache Airflow can help you orchestrate data pipelines in hybrid environments, that you've got a number of different ways you can do that, and each of those have got different tradeoffs. I'm hoping that they'll come away with a better understanding of some of the options they've got, as well as some of the new capabilities, the new innovations that we've released that make that really easier for these teams.
Speaker
Ricardo Sueiras
Principal Advocate in Open Source @AWS
Over 30 years spent working in the technology industry, helping customers solve business problems with open source and cloud. Currently I am a Developer Advocate at AWS focusing on open source, where I help raise awareness of AWS and our customers open source projects and technology, and work...
Read moreFrom the same track
Modern Data Pipelines in AdTech—Life in the Trenches
Wednesday Apr 6 / 01:40PM BST
There are various tasks that the modern data pipelines approach helps us solve in different domains, including advertising. Modern data pipelines allow us to process data in a more efficient manner with a diverse set of data transformation tools for both batch and streaming data processing....
Roksolana Diachuk
Big Data Engineer @Captify
Taming the Data Mess, How Not to Be Overwhelmed by the Data Landscape
Wednesday Apr 6 / 10:35AM BST
The data engineering field has evolved at a tremendous pace in the last decade, new systems that enable the processing of huge amounts of data generated enormous opportunities, as well as challenges for software practitioners. All these new tools and methodologies created a new set of...
Ismaël Mejía
Senior Cloud Advocate @Microsoft
Connecting Modern Data Pipelines and Data Products
Wednesday Apr 6 / 11:50AM BST
The complexity of tools, distributed systems, and the CAP theorem introduce tradeoffs that practitioners cannot avoid or ignore as they embrace the world of modern data pipelines. What strategies can you employ? This is where data products come into play. Understanding the business objectives of...
Dr. Einat Orr
Co-creator of @lakeFS, Co-founder & CEO of Treeverse
Roksolana Diachuk
Big Data Engineer @Captify
Ricardo Sueiras
Principal Advocate in Open Source @AWS
Ismaël Mejía
Senior Cloud Advocate @Microsoft
Data Versioning at Scale: Chaos and Chaos Management
Wednesday Apr 6 / 04:10PM BST
Version control is fundamental when managing code, but what about data? Our data changes over time, first since it accumulates, we have new data points for new points in time. But this is not the only reason. We also have additional data added to past time, since we were able to get additional...
Dr. Einat Orr
Co-creator of @lakeFS, Co-founder & CEO of Treeverse