AGO: Boosting Mobile AI Inference Performance by Removing Constraints on Graph Optimization

Z Xu, H Peng, W Wang - IEEE INFOCOM 2023-IEEE …, 2023 - ieeexplore.ieee.org
Z Xu, H Peng, W Wang
IEEE INFOCOM 2023-IEEE Conference on Computer Communications, 2023ieeexplore.ieee.org
Traditional deep learning compilers rely on heuristics for subgraph generation, which
impose extra constraints on graph optimization, eg, each subgraph can only contain at most
one complex operator. In this paper, we propose AGO, a framework for graph optimization
with arbitrary structures to boost the inference performance of deep models by removing
such constraints. To create new optimization opportunities for complicated subgraphs, we
propose intensive operator fusion, which effectively stitches multiple complex operators …
Traditional deep learning compilers rely on heuristics for subgraph generation, which impose extra constraints on graph optimization, e.g., each subgraph can only contain at most one complex operator. In this paper, we propose AGO, a framework for graph optimization with arbitrary structures to boost the inference performance of deep models by removing such constraints. To create new optimization opportunities for complicated subgraphs, we propose intensive operator fusion, which effectively stitches multiple complex operators together for better performance. Further, we design a graph partitioning scheme that allows an arbitrary structure for each subgraph while guaranteeing the acyclic property among all generated subgraphs. Additionally, to enable efficient performance tuning for complicated subgraphs, we devise a divide-and-conquer tuning mechanism to orchestrate different system components. Through extensive experiments on various neural networks and mobile devices, we show that our system can improve the inference performance by up to 3.3× when compared with state-of-the-art vendor libraries and deep compilers.
ieeexplore.ieee.org