Dolphinscheduler + Notebook open-source Big data Studio

高楚枫

Chinese Session 2022-07-31 15:30 GMT+8  (ROOM : A) #bigdata

For big data engineers, the development and scheduling of big data jobs usually takes place in different environments. After job development and debugging is completed in the IDE, the code is copied or packaged into a scheduling tool for scheduling. On the one hand, it affects the development efficiency; on the other hand, it may produce unpredictable problems due to the difference of environment. This presentation will show you how to use the Open source Apache Dolphinscheduler scheduling tool and Apache Zeppelin and Jupyter notebooks to form a big data development Studio. After the data platform team ADAPTS to the relevant environment, big data /AI engineers develop /debug interactive online and perform one-click scheduling, eliminating the need to spend time dealing with adaptation problems caused by inconsistent environments, greatly improving the efficiency and experience of big data operation and development. The integration code covered in the talk is fully open source and you are welcome to download it.

Speakers:


Chufeng Gao: Ali Cloud EMR data development team, Basic platform development engineer, He graduated from Shanghai Jiao Tong University and Purdue University. The ex - SDE @ Amazon, Seattle. Now he is working in the EMR data development team of Ali Cloud. Apache Dolphinscheduler, Airflow, Zeppelin Contributor. Interested in new big data development platform.