SESSION + Live Q&A

Data Inferno: 9 Circles of Data Tests With Apache Airflow

Continuous delivery is a given nowadays. This goes hand in hand with a lot of automated testing. For 'normal' applications, such testing is well known and documented in the form of unit tests, integration tests, regression tests etc. For big data applications, however, another dimension of complexity is added: that of the data itself. The truth is: real data sucks, it always surprises you by how it differs from what you expect. Unreliable data, in turn, can result in unreliable applications, which makes for unhappy users. In this talk, we'll take you on a journey through our Nine Circles of Data Tests which ensure the data is correct and makes sense. We use Airflow to do this, testing our data and logic at several steps, in order to avoid having to debug such issues over the weekend.

Topics include:

  • CI tests for your data deployments
  • Integrating data tests into your DAG
  • DTAP-ing your data deployments
  • Integrating data science models into this engineering world
  • How we went nuclear with GIT
  • How Chuck Norris keeps us honest
  • Local Airflow in Docker

Speaker

John Müller

Data Engineer WB Advanced Analytics @ING_news (ING Bank)

John works as a Data Engineer at WB Advanced Analytics of ING Bank. Working with loads of data from all kinds of different source systems gets you intimitaly familiar with some good practices in Data Engineering, as you're going to need them all when working with all of it.

Read more

Location

Westminster, 4th flr.

Track

Solutions Track IV

Video

Video is not available

Share

From the same track

SESSION + Live Q&A

Why Developers Shouldn't Care About Containers

Imagine a world where developers don't even need to know what a container is? A world where writing code is literally all a developer needs to worry about. Turns out that world already exists - and in multiple forms! Ed will cover how to increase development efficiency and will use the open...

Ed Shee

Developer Advocate @IBM

View full Schedule