OpenMLDB: An Enterprise-Grade Feature Platform Built Upon Spark

LU MIAN

Chinese Session 2022-07-29 14:40 GMT+8  #ai

OpenMLDB is an enterprise-grade feature platform that tackles the challenges of feature engineering for machine learning. It enables consistent features for offline training and online inference, and highlights the efficiency of real-time feature extraction. In this talk, we will first introduce the design methodology of OpenMLDB, which is built based on separate batch and real-time SQL engines. Then we will focus on the detailed architecture, especially (1) the optimization techniques for Spark to improve the efficiency of batch feature processing; and (2) the unified execution plan generator to inherently ensure the consistency between the batch and real-time SQL engines. Finally, we will demonstrate a few use cases for real-world machine learning applications based on OpenMLDB.

Speakers:


LU MIAN: OpenMLDB Community; 4Paradigm, OpenMLDB PMC core member; Tech lead of HPC and database teams in 4Paradigm, Dr. LU Mian is currently a System Architect and managing the Database and HPC teams in 4Paradigm. He is also the maintainer of the open-source machine learning database project OpenMLDB. Before joining 4Paradigm, he was a staff engineer in Huawei. He obtained the PhD degree of Computer Science in Hong Kong University of Science and Technology. His research interests mainly focus on the database and heterogeneous computing. He has published more than 20 relevant papers in the top conferences and journals, such as VLDB, SIGMOD, ICPP and so on. His current R&D work in 4Paradimg is leading the teams to build high performance and scalable AI infrastructures for real-world applications.