Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das

Опубликовано: 04 Ноябрь 2024
на канале: Spark Summit
37,108
339

“In Spark 2.0, we have extended DataFrames and Datasets to handle real time streaming data. This not only provides a single programming abstraction for batch and streaming data, it also brings support for event-time based processing, out-or-order/delayed data, sessionization and tight integration with non-streaming data sources and sinks. In this talk, I will take a deep dive into the concepts and the API and show how this simplifies building complex “Continuous Applications”.” - T.D.

Slides: http://www.slideshare.net/databricks/...

Databricks Blog: "Structured Streaming In Apache Spark 2.0: A new high-level API for streaming"
https://databricks.com/blog/2016/07/2...

// About the Presenter //
Tathagata Das is an Apache Spark Committer and a member of the PMC. He’s the lead developer behind Spark Streaming, and is currently employed at Databricks. Before Databricks, you could find him at the AMPLab of UC Berkeley, researching datacenter frameworks and networks with professors Scott Shenker and Ion Stoica.

Follow T.D. on -
Twitter:   / tathadas  
LinkedIn:   / tathadas