Date | Tl;DW; | Video |
---|---|---|
2023-01-12 | DAG overview page is now pretty | https://youtu.be/E3_YGF7Wr2k |
2023-01-05 | Developed the first Airflow page with an overview of Dags | https://youtu.be/oFs4OsHZfRw |
2022-12-21 | First video that started this SEG | https://youtu.be/Jrjp6_rdDo4 |
Airflow is the de facto tool for data teams to schedule and execute ELT pipelines, Machine Learning pipelines, DevOps tasks and really any task that requires scheduling. Its cronjob turned up to 11.
According to Airflow themselves:
Airflow is a platform created by the community to programmatically author, schedule and monitor workflows
Source: https://airflow.apache.org
A workflow is also called a Directed Acyclic Graph (DAG) in Airflow, a DAG contains tasks which utilize operators.
A typical development workflow looks like:
Below are some common challenges related to Airflow, in no particular order:
Below are a few of the initial options to integrate GitLab and Airflow: