A Definitive Compilation of Apache Airflow Resources

These are the resources I’ve used to understand Airflow and develop a deeper intuition for the data architecture frameworks associated with it.

Official Resources

  1. Airflow Documentation
  2. Airflow GitHub Repo
  3. Airflow Wiki Home
  4. Airflow Community Slack Channel
  5. Airflow Twitter
  6. Airflow JIRA

Introduction/Overview

  1. ‘Airflow: a workflow management platform’ by Maxime Beauchemin (Creator of Apache Airflow). Airbnb Engineering and Data Science. June, 2015.
  2. ‘Apache Airflow Grows Up!’ by Sid Anand (Chief Data Engineer, Paypal. Committer & PMC Member Apache Airflow). Medium. Jan, 2019.
  3. ‘Understanding Apache Airflow’s Key Concepts’ by Dustin Stansbury (Data Scientist, Quizlet). Medium (~ 1.8k +1’s). May, 2017.
  4. Airflow 101: How to start automating your data pipelines with Airflow’ by Sriram Baskaran (Program Director, Data Engineering Insight Data Science). Medium(~ 1.2k +1’s). Oct, 2018.
  5. [Video] ‘Best practices with Airflow- an open source platform for workflows & schedules’ by Maxime Beauchemin (Creator of Apache Airflow).
  6. [Video] ‘Modern ETL-ing with Python and Airflow (and Spark)’ by Tamara Mendt (Data Engineer, HelloFresh). PyConDE 2017.
  7. [Video] ‘A Practical Introduction to Airflow’ by Matt Davis (Data Platform Engineering at Clover). PyData SF 2016.
  8. [Video] ‘Developing elegant workflows in Python code with Apache Airflow’ by Michael Karzynski (Tech Lead at Intel). EuroPython Conference. July 2017.
  9. [Video] ‘How I learned to time travel, or, data pipelining and scheduling with Airflow’ by Laura Lorenz (Data & SWE at Industry Dive). PyData DC 2016.

Airflow in the Industry

  1. ‘Managing Uber’s Data Workflows at Scale’ by Alex Kira. Uber Data Engineering. Feb, 2019.
  2. Productionizing ML with workflows at Twitter’ by Samuel Ngahane and Devin Goodsell. Twitter Engineering. June, 2018.
  3. Apache Airflow at Pandora’ by Ace Haidrey. Pandora Engineering. Mar, 2018.
  4. Running Apache Airflow at Lyft’ by Tao Feng, Andrew Stahlman, and Junda Yang. Lyft Engineering. Dec, 2018.
  5. Why Robinhood uses Airflow’ by Vineet Goel. Robinhood Engineering. May, 2017.
  6. Collaboration between data engineers, data analysts and data scientists’ by Germain Tangus (Senior Data Engineer, Dailymotion). Dailymotion Engineering. May, 2019.
  7. How Sift Trains Thousands of Models using Apache Airflow’ by Duy Tran. Sift Engineering. Mar, 2018.
  8. Airflow, Meta Data Engineering, and a Data Platform for the World’s Largest Democracy’ by Vinayak Mehta. Socialcops Engineering. Aug, 2018.
  9. Data Traffic Control with Apache Airflow’ by Nicolas Goll Perrier (Data Engineer, leboncoin). leboncoin Engineering Blog. Jan, 2019.
  10. Airflow Part 2: Lessons Learned (at SnapTravel)’ by Nehil Jain. SnapTravel Engineering. Jun, 2018.
  11. Using Apache Airflow to Create Data Infrastructure in the Public Sector’ by Varun Adibhatla and Laurel Brunk. Astronomer.io. Oct, 2017.
  12. Airflow at WePay’ by Chris Riccomini. WePay. Jul, 2016.

Additional Reading

  1. How Apache Airflow Distributes Jobs on Celery workers’ by Hugo Lime (Data Scientist, Sicara AI and Big Data). Sicara Engineering. Apr, 2019.
  2. A Guide On How To Build An Airflow Server/Cluster’ by Tianlon Song (Sr. Software Engineer, Machine Learning & Big Data at Zillow). Oct, 2016.
  3. We’re All Using Airflow Wrong and How to Fix It’ by Jessica Laughlin. Bluecore Engineering. Aug, 2018.

I should update this post as I continue my work and find more helpful resources. Please feel free to suggest additions to this list.

For an ultra exhaustive compilation of Airflow resources, check out the ‘Awesome Apache Airflow GitHub Repo’ by Jakob Homan (Data Software Engineer, Lyft. Airflow Committer and PMC Member).

Aakash Pydi

Code. Debug. Repeat. Currently a Software Engineer on a DevOps+Data Engineering team at Cerner. Always promoting curiosity, camaraderie and compassion.

Written on June 15, 2019