Streaming data pipelines pose unique requirements for the handling of errors and other malfunctions because they are executed continuously and cannot be manually supervised. As a consequence, we need to automate the handling of errors as much as possible.
This talk answers three critical questions in the context of data streaming: What are potential errors? How shall we handle the different kinds of errors? Which metrics help us to keep track of the health of streaming data pipelines?
We discuss (1) errors that happen when consuming Apache Kafka topics, e.g., when deserializing records, (2) errors that happen when producing records to Apache Kafka topics, e.g., when serializing data, (3) errors that happen when processing records, e.g., exceptions raised in data transformations, and (4) errors that are caused by external factors, e.g., when the streaming data pipeline exceeds available memory resources.
Once potential errors have been introduced, we show how to cope with them through design patterns, like dead-letter queues, or practical approaches, like log-based alerts.
Finally, we discuss important metrics for monitoring the health of streaming data pipelines, e.g., consumer lags, or producing rates for dead-letter topics.
While we use examples from Kafka Streams applications, the presented content can be easily transferred to other stream processing frameworks.
Speaker: Stefan Sprenger
More: https://2023.berlinbuzzwords.de/sessi...
Web: https://2023.berlinbuzzwords.de/
Fediverse: https://floss.social/@berlinbuzzwords
Linkedin: / 13978964
Twitter: / berlinbuzzwords