Data Pipeline HealthCheck for Correctness, Performance, and Cost Efficiency

Опубликовано: 10 Октябрь 2024
на канале: Apache Airflow
832
8

We are witnessing a rapid growth in the number of mission-critical data pipelines that leaders of data products are responsible for. “Are your data pipelines healthy?” This question was posed to more than 200 leaders of data products from various industries. The answers ranged from “unfortunately, no” to “they are mostly fine, but I am always afraid that something or the other will cause a pipeline to break”. This talk presents the concept of Pipeline HealthCheck (PHC) which enables leaders of data products to have high confidence in the correctness, performance, and cost efficiency of their data pipelines.