The practice and thinking of big data Python ecology in transmitting wisdom education

张敬存,赵晨杰

Chinese Session 2022-07-31 16:50 GMT+8  (ROOM : B) #bigdata

Python occupies an increasing proportion in the field of big data and artificial intelligence. It was mentioned in the upgrade of Spark3.0 that Python is now one of the most widely used languages of Spark. Python syntax is simple and easy to learn, providing strong support for enterprises to quickly solve big data ecological problems. Highlights of the Python ecology of Big Data: (1) Output the computing power of big data to Python users. By providing a series of Python apis for big data components, it is convenient for users familiar with Python language to develop big data jobs, such as PySpark, PyFlink, etc. (2) Distributed Python ecosystem based on big data storage and computing. Python library apis are used, but the underlying computing engine uses big data computing engines such as TensorFlow On Flink, SparkTorch, flink-Onnx-Pytorch. In this lecture, we will discuss the intellectual education in big data Python ecological for real-time recommendation business line of best practice and thinking, project real-time data is read from the Pulsar, because of the need to use Alink machine learning library build offline features, online training and update model and through the corresponding recommendation algorithm with recall and ordering service, This package supports Python language better, so it also chooses PyFlink based on stream and batch integrated architecture for data processing and statistical analysis to build user portrait platform; In addition, we will also discuss the real-time recommendation business module to conduct online learning according to the online feedback data, and adjust the model quickly in real time to form a closed-loop system. Finally, we’ll discuss an optimal solution for the big data Python ecosystem complete recommendation system to put all of the above into practice.

Speakers:


Jingcun Zhang : Jiangsu Chuanzhi Education Technology Co., LTD, Senior Research Fellow, Zhang Jingcun, senior researcher of Chuanzhi Education; 15 years Java/ big data development experience; Focused on real-time computing, contributed over 10 PR to PyFlink.


Chenjie Zhao: Jiangsu Chuanzhi Education Technology Co., LTD, Senior Research Fellow, Senior researcher of Chuanzhi Education, focusing on the application of ML/DL/PR/KG domain related algorithms.