版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Beihang Univ Beijing Peoples R China Stanford Univ Stanford CA USA Tsinghua Univ Beijing Peoples R China
出 版 物:《JOURNAL OF SYSTEMS ARCHITECTURE》 (系统结构杂志)
年 卷 期:2024年第152卷
核心收录:
学科分类:08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:National Key Research and Develop- ment Program of China [2021ZD0110202] Academic Excellence Foundation of BUAA Shuimu Tsinghua Scholar Program
主 题:Edge computing Model inference Dataflow-centric Computation graph Data locality
摘 要:Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Besides, the operator-centric framework incurs significant costs for continuous development and maintenance. Targeting the existing drawbacks of operator-centric frameworks, we design Xenos, which can automatically conduct dataflow-centric optimization of the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation demonstrates the effectiveness of vertical and horizontal dataflow optimization, which reduce the inference time by 15.0%-84.9% and 17.9%-89.9% , respectively. Besides, Xenos also outperforms the widely-used TVM by 1.1x-1.9x. Moreover, we extend Xenos to a distributed solution, which we call d-Xenos. d-Xenos employs multiple edge devices to jointly conduct the inference task and achieves a speedup of 3.68x-3.78x compared with the single device.