Practice of Dolphin Scheduler on Data Lake Based On Apache Hudi

Zhao Yuwei

English Session 2021-08-07 15:30 GMT+8 #workflowdatagovernance

Data lake is an enterprise-level data management platform for analyzing different types of data sources. Data lake architecture ensures the integration of multiple data sources with unlimited schema to ensure the accuracy of the data. It can meet the needs of real-time analysis, and also serves as a data warehouse to meet the needs of batch data mining. Therefore, we need an efficient, stable and easy-to-expandable task scheduling system to coordinate the external capabilities of the data lake, like data ingestion, data storage, data exploration, data discovery, data governance and so on. Here I will share why we choose Apache Dolphin Scheduler as task scheduling system and how we make it possible for data users to easily interact with the data lake without having to pay attention to too many technical details.

Speakers:

Zhao Yuwei: Engaged in Hadoop-related development work, the current main work direction is the research and development of task scheduling system.