Apache Atlas meets Apache Flink

Josh Yeh, Yan Liu

English Session 2021-08-07 15:30 GMT+8  (ROOM : A) #bigdata

Apache Atlas has become the one of the rock star project for metadata management,where it can handle from data lineage to data tagging and terms. Apache Flink has also become the standard of streaming processing, while Apache Flink is powerful at processing data at scale, tracking the lineage became an problem for Apache Flink.

In this session, I would like to share the recent community progress on connecting Apache Atlas and Apache Flink, and how the community can benefit from tracking the Apache Flink application’s metadata.

Speakers:

Josh Yeh: Cloudera Software Engineer currently working on streaming workflow governance with Apache Flink and Apache Atlas. Previous projects included developing Machine Learning Operations (MLOPS) focusing on Model lineage and Model Metrics on Cloudera multi-tenant SAAS platform and on-premise product Cloudera Data Science Workbench (CDSW), building data pipeline/workload automation with ML/DL/AI framework: keras, pytorch, tensorflow, CDSW Nvidia GPU support, and Cloudera Manager HDFS and Hive Backup and Disaster Restore (BDR).

Yan Liu: Been as a Solution Engineer at Cloudera for 5 years, I helped lots of customer on on the successful of adopting the Apache projects running in product