检索结果-内蒙古大学图书馆

Cutting-Edge Inference: Dynamic dnn model partitioning and Resource Scaling for Mobile AI

IEEE TRANSACTIONS ON SERVICES COMPUTING 2024年第6期17卷 3300-3316页

作者： Lim, Jeong-A Lee, Joohyun Kwak, Jeongho Kim, Yeongjin Inha Univ Dept Elect Engn Incheon 22212 South Korea Hanyang Univ Dept Elect & Elect Engn Ansan 15588 South Korea Daegu Gyeongbuk Inst Sci & Technol DGIST Informat & Commun Engn Daegu 42988 South Korea

Recently, applications using artificial intelligence (AI) technique in mobile devices such as augmented reality have been extensively pervasive. The hardware specifications of mobile devices, dynamic service demands, stochastic network states, and characteristics of dnn (Deep Neural Network) models affect the quality of experience (QoE) of such applications. In this paper, we propose CutEdge , that leverages a virtual queue-based Lyapunov optimization framework to jointly optimize dnn model partitioning between a mobile device and a mobile edge computing (MEC) server and processing/networking resources in a mobile device with respect to internal/external system dynamics. Specifically, CutEdge makes decisions of (i) the partition point of dnn model between the mobile device and MEC server, (ii) GPU clock frequency, and (iii) transmission rates in a mobile device, simultaneously. Then, we theoretically show the optimal trade-off curves among energy consumption, throughput, and end-to-end latency yielded by CutEdge where such QoE metrics have not been jointly addressed in the previous studies. Moreover, we show the impact of joint optimization of three control parameters on the performances via real trace-driven simulations. Finally, we show the superiority of CutEdge over the existing algorithms by experiment on top of implemented testbed using an embedded AI device and an MEC server.

关键词： Mobile handsets Computational modeling Servers Artificial intelligence Quality of experience Artificial neural networks Accuracy dnn model partitioning deep learning mobile edge computing mobile vision application quality of experience

来源：评论

学校读者我要写书评

暂无评论

Joint Optimization of Device Placement and model partitioning for Cooperative dnn Inference in Heterogeneous Edge Computing

引用

IEEE TRANSACTIONS ON MOBILE COMPUTING 2025年第1期24卷 210-226页

作者： Dai, Penglin Han, Biao Li, Ke Xu, Xincao Xing, Huanlai Liu, Kai Southwest Jiaotong Univ Sch Comp & Artificial Intelligence Chengdu 611756 Peoples R China Univ Elect Sci & Technol China Shenzhen Inst Adv Study Chengdu 610051 Peoples R China Chongqing Univ Coll Comp Sci Chongqing 400040 Peoples R China

EdgeAI represents a compelling approach for deploying dnn models at network edge through model partitioning. However, most existing partitioning strategies have primarily concentrated on homogeneous environments, neglecting the effect of device placement and their inapplicability to heterogeneous settings. Moreover, these strategies often rely on either data parallelism or model parallelism, each presenting its own limitations, including data synchronization and communication overhead. This paper aims at enhancing inference performance through a pipeline system of devices through leveraging both parallel and sequential relationships among them. Accordingly, the problem of Multi-Device Cooperative dnn Inference is formulated by optimizing both device placement and model partitioning, taking into account the unique characteristics of heterogeneous edge resources and dnn models, with the goal of maximizing throughput. To this end, we propose an evolutionary device placement technique to determine the pipeline stage of devices by enhancing a variant of particle swarm optimization. Subsequently, an adaptive model partitioning strategy is developed by combining intra-layer and inter-layer model partitioning based on dynamic programming and the input-output mapping of dnn layers, respectively, to accommodate edge resource limitations. Finally, we construct a simulation model and a prototype, and the extensive results demonstrate that our proposed algorithm outperforms current state-of-the-art algorithms.

关键词： Computational modeling Artificial neural networks Parallel processing Data models Adaptation models partitioning algorithms Inference algorithms Multi-device cooperative inference heterogeneous edge computing dnn model partitioning device placement

来源：评论

学校读者我要写书评

暂无评论

dnn Real-Time Collaborative Inference Acceleration with Mobile Edge Computing

DNN Real-Time Collaborative Inference Acceleration with Mobi...

引用

IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) / IEEE World Congress on Computational Intelligence (IEEE WCCI) / International Joint Conference on Neural Networks (IJCNN) / IEEE Congress on Evolutionary Computation (IEEE CEC)

作者： Yang, Run Li, Yan He, Hui Zhang, Weizhe Harbin Inst Technol Sch Cyberspace Sci Harbin 150001 Peoples R China

ISBN: (数字)9781728186719

ISBN: (纸本)9781728186719

The collaborative inference approach splits the Deep Neural Networks (dnns) model into two parts. It runs collaboratively on the end device and cloud server to minimize inference latency and protect data privacy, especially in the 5G era. The scheme of dnn model partitioning depends on the network bandwidth size. However, in the context of dynamic mobile networks, resource-constrained devices cannot efficiently execute complex model partitioning algorithms to obtain optimal partitioning in real-time. In this paper, to overcome this challenge, we first formulate the model partitioning problem as a Min-cut problem to seek the optimal partition. Second, we propose a Collaborative Inference method based on model Compression named CIC. CIC enhances the efficiency of the execution of model partitioning algorithms on resource-constrained end devices by reducing the algorithm's complexity. CIC generates a splitting model based on the inherent characteristics of the dnn model and the platform resources. The splitting models are independent of the network environment, generated offline, and constantly used in the current environment. CIC has excellent compressibility, and even dnn models with hundreds of layers can be rapidly partitioned on resource-constrained devices. Experimental results show that our method is significantly more effective than existing solutions, speeding up model partitioning decision time by up to 100x, reducing inference latency by up to 2.6x, and increasing throughput by up to 3.3x in the best case.

关键词： Collaborative inference edge computing dnn model partitioning inference acceleration data privacy

来源：评论

学校读者我要写书评

暂无评论

Optimizing dnn training with pipeline model parallelism for enhanced performance in embedded systems

引用

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 2024年 190卷

作者： Al Maruf, Md Azim, Akramul Auluck, Nitin Sahi, Mansi Ontario Tech Univ 2000 Simcoe St N Oshawa ON L1G 0C5 Canada Indian Inst Technol Ropar Ropar 140001 Punjab India

Deep Neural Networks (dnns) have gained widespread popularity in different domain applications due to their dominant performance. Despite the prevalence of massively parallel multi-core processor architectures, adopting large dnn models in embedded systems remains challenging, as most embedded applications are designed with single-core processors in mind. This limits dnn adoption in embedded systems due to inefficient leveraging of model parallelization and workload partitioning. Prior solutions attempt to address these challenges using data and model parallelism. However, they lack in finding optimal dnn model partitions and distributing them efficiently to achieve improved performance. This paper proposes a dnn model parallelism framework to accelerate model training by finding the optimal number of model partitions and resource provisions. The proposed framework combines data and model parallelism techniques to optimize the parallel processing of dnns for embedded applications. In addition, it implements the pipeline execution of the partitioned models and integrates a task controller to manage the computing resources. The experimental results for image object detection demonstrate the applicability of our proposed framework in estimating the latest execution time and reducing overall model training time by almost 44.87% compared to the baseline AlexNet convolutional neural network (CNN) model.

关键词： Parallel Computing Machine learning model Parallelism dnn model partitioning Embedded systems Embedded software

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：