Hybrid Apache Spark Architecture with YARN and Kubernetes

Опубликовано: 01 Октябрь 2024
на канале: Databricks
1,794
22

Lyft is on the mission to improve people’s lives with the world’s best transportation. Starting 2019, Lyft has been running both Batch ETL and ML spark workloads primarily on Kubernetes with the Apache Spark on k8s operator. However, with the increasing scale of workloads in frequency and resource requirements, we started hitting numerous reliability issues related to IP allocation, container images, IAM role assignment, and Kubernetes Control Plane.

To continue supporting growing Spark usage with Lyft, the team came up with a hybrid architecture optimized for containerized and non-containerized workload based on Kubernetes and YARN. In this talk, we will also cover a dynamic runtime controller that helps with per environment config overrides and easy switchover between resource managers.

Connect with us:
Website: https://databricks.com
Facebook:   / databricksinc  
Twitter:   / databricks  
LinkedIn:   / databricks  
Instagram:   / databricksinc   Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-nam...