Presented by Udit Saxena at Airflow Summit 2024.
This talk explores ASAPP’s use of Apache Airflow to streamline and optimize our machine learning operations (MLOps).
Key highlights include:
Integrating with our custom Spark solution for achieving speedup, efficiency, and cost gains for generative AI transcription, summarization and intent categorization pipelines
Different design patterns of integrating with efficient LLM servers - like TGI/vllm/tensor-RT for Summarization pipelines with/without Spark.
An overview of batched LLM inference using Airflow as opposed to real time inference outside of it
Additionally, the talk will cover ASAPP’s MLOps journey with Airflow over the past few years, including an overview of our cloud infrastructure, various data backends, and sources.