Practice of building real-time data lake based on Flink

Wang Zheng,Min Zhongyuan

Chinese Session 2023-08-18 14:00 GMT+8  #datalake

Real-time data lakes are a core component of modern data architectures that allow businesses to analyze and query large amounts of data in real time. In this sharing, we will first introduce the current pain points of the real-time data lake, such as high timeliness, diversity, consistency and accuracy of data. Then, we describe how to build a real-time data lake based on Flink and Iceberg, mainly through the following two parts: how to enter the data into the lake in real time and how to use Flink for OLAP temporary query. Finally, some practical benefits of Bytedance in real-time data lake are introduced.

Speakers:


Wang Zheng: bytedance, Cloud-Native Computing R&D Engineer at the Volcano Engine, Joined Bytedance in 2021 and worked in the Infrastructure Open Platform team, mainly responsible for Serverless Flink and other directions of research and development.


Min Zhongyuan: bytedance, Cloud-Native Computing R&D Engineer at the Volcano Engine, Joined Bytedance in 2021 and worked in the Infrastructure Open Platform team, mainly responsible for Serverless Flink, Flink OLAP and other directions of research and development.