Bringing LLM to Everywhere via Machine Learning Compilation

Siyuan Feng

English Session 2023-08-18 15:45 GMT+8 #ai

Significant progress has been made in the field of generative artificial intelligence and large language models (LLMs), which possess remarkable capabilities and the potential to fundamentally transform many domains. However, nowadays, LLMs require extensive computation and memory to run and usually run on servers with cloud GPUs. And we introduce MLC-LLM, an open-sourced project based on Apache TVM to run LLMs on PC, Mobile, and even WebGPU with GPU acceleration.

Speakers:

Siyuan Feng: Shanghai Jiao Tong University, Ph.D. Student, I’m a Ph.D. student in Zhiyuan Honors Program at Shanghai Jiao Tong University. Also, I’m a PMC member of Apache TVM, working closely with the community and developing new features, including TensorIR, Meta-Schedule, Auto-Tensorization, and Relax (next Relay). Recently, I am spending my time on MLC-LLM to deploy a large language model on every device.