Streamlining a Mortgage ETL Pipeline with Apache Airflow

Опубликовано: 19 Октябрь 2024
на канале: Apache Airflow
257
7

Presented by Zhang Zhang & Jenny Gao at Airflow Summit 2024.

At Bloomberg, it is our team’s responsibility to ensure the timely delivery to our clients worldwide of a vast dataset comprising approximately 5 billion data points on roughly 50 million loans and over 1.4 million securities, disclosed twice a month by three major government-sponsored mortgage entities. Ingesting this data so we can create and derive complex data structures to be consumed by our applications for our clients has been our biggest challenge. In this talk, we will discuss our transition from a manually-managed spreadsheet-based system to an automated centralized orchestration tool, and how Apache Airflow has helped make the process more transparent, predictable, and visible.