Bytedance data Lake table optimization management service based on Apache Hudi

喻兆靖

Chinese Session 2022-07-30 14:50 GMT+8  (ROOM : A) #bigdata

Bytedance currently has one of the largest data lake coverage companies in China, covering 100 petabytes of data. As the number of tasks increases, the cost of task management increases significantly, and Hudi’s own table services such as compaction and clustering provide a fairly basic strategy. In this context, Bytedance has implemented a data lake management optimization table optimization management service for unified management and optimization of Hudi tables, fully hosting the adaptive Hudi table optimization task, and plans to contribute to the Hudi community in the near future.

Speakers:


Zhaojing Yu: Bytes Dance, Senior Development Engineer, Currently, I am responsible for data Lake engine development on bytedance Data Lake team and am active in the Hudi community as an Apache Hudi Committer.