Data Lake & Data Warehouse

Track Chairs : Lidong Dai

Data Lake and Data Warehouse are important solutions for storing and managing data, and they play a crucial role in data management, data analysis, and decision-making. In ASF, there are various projects about Data Lake and Data Warehouse, for example: Apache Hive, Apache Hudi, Apache Iceberg, Apache Paimon, Apache Cassandra, Apache HBase etc. In this topic, you will get the latest status of data lake and warehouse, best practices the companies use them in the production, and the roadmap of these projects.

2023-08-18

13:30 GMT+8 Challenges and Solutions on building Realtime Data warehousing with Apache Flink , Apache Hive and Apache Iceberg _{Chinese Session} Yan Liu 刘岩

14:00 GMT+8 Practice of building real-time data lake based on Flink _{Chinese Session} Wang Zheng,Min Zhongyuan

14:30 GMT+8 OpenEuler and Bigtop with Ambari : Empower Data Lake in the real world _{Chinese Session} Yuqi Gu

15:00 GMT+8 APACHE LINKIS data processing practice in lake-silo architecture _{Chinese Session} 王华磊

15:45 GMT+8 The practice and optimization of data Lake Iceberg in Xiaomi _{Chinese Session} 肖杰宝

16:15 GMT+8 Bytedance based on the Parquet format of cost reduction and efficiency practice _{Chinese Session} 徐庆,王恩策

2023-08-19

13:30 GMT+8 How to boost a cloud-native lakehouse with 2x performance _{Chinese Session} 史少锋

14:00 GMT+8 Apache Paimon Stream data Lake: CDC feed lake and stream read _{Chinese Session} 李劲松

14:30 GMT+8 Application of Apache SeaTunnel, the next generation of ultra-high performance big data integration tool, in the data lake scenario _{Chinese Session} 代立冬

15:00 GMT+8 Innovative lakehouse design atop Apache Iceberg, Apache Arrow and Apache Parquet _{Chinese Session} 吴刚,付旭炜