Real-time deep learning training PAI-ODL

刘童璇

Chinese Session 2022-07-30 14:00 GMT+8  #ai

DeepRec(PAI-TF) is a unified large-scale sparse model training/prediction engine of Alibaba Group, which is widely used in Taobao, Tmall, Ali Mama, Autaavi, Taobao, AliExpress, Lazada, etc., supporting taobao’s core business of search, recommendation, advertising and so on. Super-large sparse training supported by billions of features and trillions of samples.

Online Deep Learning, based on DeepRec, Flink, Kafka and Flink-Aiflow, combines Online Learning with offline training to create an integrated Online and offline Learning framework. Based on cloud native architecture, it provides users with a complete solution from offline to Online. This talk will introduce a series of key technologies in ODL scenarios, including: super-sparse model training/prediction, second-level model thermal update, real-time training model correction, model rollback and sample playback, sample repair, real-time training elastic resource scheduling, etc.

Speakers:


Tongxuan Liu: Ali Cloud Intelligent Computing Platform Division PAI, Senior technical Specialist, Engaged in the research and development of machine learning platform/deep learning engine for a long time, responsible for the training and prediction optimization of large-scale sparse model, long-term support for Ali’s core businesses such as search, recommendation and advertising, and responsible for Ali’s large-scale sparse model training frameworks DeepRec and ODL.