Airflow 3 Roadmap Discussion

Опубликовано: 15 Октябрь 2024
на канале: Apache Airflow
410
7

Presented at Airflow Summit 2024

Join us in this panel with key members of the community behind the development of Apache Airflow where we will discuss the tentative scope for the next generation, i.e. Airflow 3.

-----
(GenAI summary ahead)

This transcript captures a panel discussion about Apache Airflow 3. The panelists are:

**Kaxil Naik**: Senior Director of Engineering at Astronomer, Airflow PMC member.
**Shubham Mehta**: Senior Product Manager at AWS, working on upstream contributions to Airflow.
**Michal Modras**: Engineering Manager at Google Cloud Composer, involved in Airflow discussions from the beginning.
**Constance Martineau**: Senior Product Manager at Astronomer, works with CX and open source team.
**Madison Swain-Bowden (Moderator)**: Staff Data Engineer at Automatic, Airflow community member since 2017.

The discussion revolves around four major themes of Airflow 3:

1. *Remote and Edge Execution:*
Driven by organizations with data residency requirements, needing to run workflows across regions, hybrid clouds, and on-premise systems.
Enables optimization by processing data locally.
Facilitates migration from tools like Autosys and Control-M.
Allows running tasks in any language, including Windows support and GPU utilization.

2. *Temporality (Data Assets and Event-Driven Scheduling):*
Addresses the need to depend on external data sources not produced within the Airflow instance.
Leverages existing components like sensors, defer operators, and data-driven scheduling.
Introduces the concept of "data assets," expanding beyond storage to include ML models, with support for partitions representing data slices.
Enables block-level lineage, data quality-based orchestration, and robust data operations.

3. *Tasks in Other Languages:*
Addresses the needs of organizations with diverse tech stacks, allowing users to code in their preferred languages.
Improves upon the limitations of using the BashOperator for non-Python tasks.
Aims to provide first-class support for languages like Kotlin and TypeScript.
Plans to expand multi-language support to DAG level in the future.

4. *Usability Features:*
*Performance:*
Significant improvements in DAG parsing and processing, reducing CPU load and database contention.
Scheduler optimizations for higher task throughput, faster DAG launches, and on-demand execution.
Backfills managed by the scheduler, enabling prioritization and preventing impact on production workloads.
*Multi-Team Management:*
Introduces "team" as an ingrained part of Airflow, allowing for team-specific configurations and isolation.
Enables sharing of resources like scheduler, web server, and database instance to reduce costs.
Data assets become the shared layer for interaction between different teams' DAGs.
*UI Revamp:*
Modernized UI based entirely on React, providing a more intuitive and scalable experience.
Redesigned homepage and DAG list view to handle the increasing number of DAGs and tasks.
*Smooth Migration:*
Prioritizes ease of transition from Airflow 2 to 3.
Well-behaved DAGs should be compatible with both versions.
Migration tool planned for transforming Airflow 2 deployments to Airflow 3.
*DAG Versioning:*
DAG versioning will be a core feature in Airflow 3.
Introduces "DAG bundles" for fetching DAGs from different sources and versioning them.
Provides full DAG versioning history in the UI.
Allows registering DAG bundles via the API.

The panel discussion concludes with a call to action for community involvement:

Fill out the Airflow Debugging Improvement survey.
Join the #airflow-debugging channel on Slack.
Contribute to the Airflow 3 release GitHub issue.
Provide feedback on the dev mailing list.

Overall, the panel discussion highlights the significant advancements coming in Airflow 3, addressing key user needs and setting the stage for a more scalable, performant, and user-friendly orchestration platform.