BladeDISC: A deep learning compiler practice that supports dynamic shapes
邱侠斐
Chinese Session 2022-07-30 16:00 GMT+8 #aiWith the continuous development of deep learning, the model structure is constantly evolving, and the underlying computing hardware technology is emerging in an endless stream. For the majority of developers, it is not only important to consider how to effectively use computing power in complex and changing scenarios, but also to deal with the continuous iteration of computing frameworks. Deep compilers are the technical direction that the industry has explored in practice to solve the above problems. They allow developers to focus only on the upper model development, reduce the labor cost of manual optimization, and further squeeze the hardware performance.
A number of deep learning compilers, represented by TensorFlow XLA and Apache TVM, have emerged in the industry. But with the development of deep learning, more and more models show the characteristics of dynamic Tensor Shape. For example, 1) CNN models that can support input of pictures of various sizes; 2) The NLP model represented by Bert is dynamic in both BatchSize and SequenceLength dimensions. However, the compiler was originally designed for static Shape scenarios, which requires input and the Tensor has fixed dimensions in each dimension, so it doesn’t work well for dynamic Shape scenarios. Faced with the strong demand in the business, the deep learning community has also appeared the work of TVM Relay VM. This talk mainly introduces aliyun PAI team’s BladeDISC centered work on dynamic Shape compiler, including the following contents:
- BladeDISC’s main architecture: why and how to build a dynamic Shape-based compiler based on the MLIR framework, why BladeDISC chose MHLO as the access tier Dialect, and what modifications BladeDISC is making for MHLO Dialect. What are the differences in the technical approach compared to Apache TVM and the design considerations behind it?
- Challenges brought by dynamic Shape: many deterministic problems under static Shape semantics, such as vectorization of instruction layer, codeGen template selection, whether implicit Broadcast is needed, etc., will face greater complexity in dynamic Shape scenarios. Some of the compile-time decisions need to be moved to runtime.
- Large-granularity operator fusion: how to realize larger scale operator fusion through hardware (Shared memory in GPU, Memroy Cache in CPU) with low access cost under the semantics of dynamic Shape. And how do you push the Tensor constraints on shapes even if you don’t know the shapes?
- Computation-intensive operators: how to improve the efficiency of computation-intensive operators by using vendor library, Apache TVM operator generation, handwriting operator and other means, and how to realize the unity and complementarity of non-compilation-generated operators with compiler in architecture.
- How to support a variety of deep learning frameworks, lower the threshold of use for end users, and the application of BladeDISC in Ali Cloud business.
Speakers:
XiaFei Qiu: Ali Cloud Computing Co., LTD, Senior technical Specialist, Ali Cloud PAI team is responsible for the internal and external AI infrastructure of Ali Group, and model system optimization has always been one of the team’s key technical directions. Compiler as an important means of system optimization, through internal precipitation years of grinding, is now in making open source ([https://github.com/alibaba/BladeDISC] (https://github.com/alibaba/BladeDISC)).
Other team shares include:
-Pai-blade TensorRT Easier and More reliable to use and More Robust TensorRT via Pai-blade: [https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s41395/](https://www.nvidia.com/en-us/on-demand/session/gtcs pring22-s41395/)
- Ali Cloud universal transparent performance solution based on AI compiler: [https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s41073/](https://www.nvidia.com/en-us/on-demand/session/gtcs pring22-s41073/)
- AStitch: Enabling A New Multi-Dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures", Zhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, and Wei Lin. ASPLOS 2022
- Model optimization (compilation technology) introduction and best practice sharing of Aurora advertising system: https://yunqi.aliyun.com/2021/agenda/session147