OpenLineage is a standard for metadata and lineage collection that is growing rapidly. Column-level lineage is one of its most anticipated features of the community that has been developed recently. In this talk, we:
show foundations for column lineage within OpenLineage standard,
provide real-life demo on how is it automatically extracted from Spark jobs,
describe and demo column lineage extraction from SQL queries,
show how the lineage can be consumed on Marquez backend.
We aim to provide demos to focus on practical aspects of the column-level lineage which are interesting to data practitioners all over the world.
Speaker: Paweł Leszczyński, Maciej Obuchowski
More: https://2023.berlinbuzzwords.de/sessi...
Web: https://2023.berlinbuzzwords.de/
Fediverse: https://floss.social/@berlinbuzzwords
Linkedin: / 13978964
Twitter: / berlinbuzzwords