Improving Apache Spark Structured Streaming Application Processing Time by Configurations

Опубликовано: 12 Март 2025
на канале: Academia de Dados
270
1

Improving Apache Spark Structured Streaming Application Processing Time by Configurations, Code Optimizations, and Custom Data Source
Databricks Data + AI Summit 2022

Kineret Raviv
Principal software engineer
Akamai

Nir Dror
Principal Performance Engineer
Akamai

In this session, we'll go over several use-cases and describe the process of improving our spark structured streaming application micro-batch time from ~55 to ~30 seconds in several steps.

Our app is processing ~ 700 MB/s of compressed data, it has very strict KPIs, and it is using several technologies and frameworks such as: Spark 3.1, Kafka, Azure Blob Storage, AKS and Java 11.

We'll share our work and experience in those fields, and go over a few tips to create better Spark structured streaming applications.

The main areas that will be discussed are: Spark Configuration changes, code optimizations and the implementation of the Spark custom data source.

#AI #Data #Databricks #DeltaLake #Lakehouse #MLOps