Catch the P99 by the Tail -- Performance tuning for machine learning reasoning

兰青

Chinese Session 2022-07-29 16:00 GMT+8 #ai

With the adoption of machine learning, many companies are facing performance requirements in their model deployments. Whether online or offline, there are many difficult problems such as machine selection and parameter configuration. For example, how to optimize the inference performance of P90/P99? How to solve the extension problem of offline reasoning? In this session, we will introduce several common high-performance machine learning systems and share some of the challenges encountered in deploying applications in the real world. We’ll also cover how to troubleshoot machine learning reasoning problems faster, and how to ultimately improve CPU/GPU utilization. These machine learning architectures are based on popular open source frameworks such as Apache Spark, DeepJavaLibrary, And Java Spring.

Speakers:

LanQing: Amazon Cloud Technology, Software Development Engineer, Qing is a software development engineer for AWS machine learning platform. He is a co-author of DJL (djL.ai) and a member of PPMC for Apache MXNet. He graduated from Columbia University in 2017 with a master’s degree in computer engineering. He has expertise in model training and reasoning, as well as practical experience in advertising and big data.