Recently, applications using artificial intelligence (AI) technique in mobile devices such as augmented reality have been extensively pervasive. The hardware specifications of mobile devices, dynamic service demands, ...
详细信息
Recently, applications using artificial intelligence (AI) technique in mobile devices such as augmented reality have been extensively pervasive. The hardware specifications of mobile devices, dynamic service demands, stochastic network states, and characteristics of dnn (Deep Neural Network) models affect the quality of experience (QoE) of such applications. In this paper, we propose CutEdge , that leverages a virtual queue-based Lyapunov optimization framework to jointly optimize dnn model partitioning between a mobile device and a mobile edge computing (MEC) server and processing/networking resources in a mobile device with respect to internal/external system dynamics. Specifically, CutEdge makes decisions of (i) the partition point of dnnmodel between the mobile device and MEC server, (ii) GPU clock frequency, and (iii) transmission rates in a mobile device, simultaneously. Then, we theoretically show the optimal trade-off curves among energy consumption, throughput, and end-to-end latency yielded by CutEdge where such QoE metrics have not been jointly addressed in the previous studies. Moreover, we show the impact of joint optimization of three control parameters on the performances via real trace-driven simulations. Finally, we show the superiority of CutEdge over the existing algorithms by experiment on top of implemented testbed using an embedded AI device and an MEC server.
EdgeAI represents a compelling approach for deploying dnnmodels at network edge through modelpartitioning. However, most existing partitioning strategies have primarily concentrated on homogeneous environments, negl...
详细信息
EdgeAI represents a compelling approach for deploying dnnmodels at network edge through modelpartitioning. However, most existing partitioning strategies have primarily concentrated on homogeneous environments, neglecting the effect of device placement and their inapplicability to heterogeneous settings. Moreover, these strategies often rely on either data parallelism or model parallelism, each presenting its own limitations, including data synchronization and communication overhead. This paper aims at enhancing inference performance through a pipeline system of devices through leveraging both parallel and sequential relationships among them. Accordingly, the problem of Multi-Device Cooperative dnn Inference is formulated by optimizing both device placement and modelpartitioning, taking into account the unique characteristics of heterogeneous edge resources and dnnmodels, with the goal of maximizing throughput. To this end, we propose an evolutionary device placement technique to determine the pipeline stage of devices by enhancing a variant of particle swarm optimization. Subsequently, an adaptive modelpartitioning strategy is developed by combining intra-layer and inter-layer modelpartitioning based on dynamic programming and the input-output mapping of dnn layers, respectively, to accommodate edge resource limitations. Finally, we construct a simulation model and a prototype, and the extensive results demonstrate that our proposed algorithm outperforms current state-of-the-art algorithms.
The collaborative inference approach splits the Deep Neural Networks (dnns) model into two parts. It runs collaboratively on the end device and cloud server to minimize inference latency and protect data privacy, espe...
详细信息
ISBN:
(数字)9781728186719
ISBN:
(纸本)9781728186719
The collaborative inference approach splits the Deep Neural Networks (dnns) model into two parts. It runs collaboratively on the end device and cloud server to minimize inference latency and protect data privacy, especially in the 5G era. The scheme of dnn model partitioning depends on the network bandwidth size. However, in the context of dynamic mobile networks, resource-constrained devices cannot efficiently execute complex modelpartitioning algorithms to obtain optimal partitioning in real-time. In this paper, to overcome this challenge, we first formulate the modelpartitioning problem as a Min-cut problem to seek the optimal partition. Second, we propose a Collaborative Inference method based on model Compression named CIC. CIC enhances the efficiency of the execution of modelpartitioning algorithms on resource-constrained end devices by reducing the algorithm's complexity. CIC generates a splitting model based on the inherent characteristics of the dnnmodel and the platform resources. The splitting models are independent of the network environment, generated offline, and constantly used in the current environment. CIC has excellent compressibility, and even dnnmodels with hundreds of layers can be rapidly partitioned on resource-constrained devices. Experimental results show that our method is significantly more effective than existing solutions, speeding up modelpartitioning decision time by up to 100x, reducing inference latency by up to 2.6x, and increasing throughput by up to 3.3x in the best case.
Deep Neural Networks (dnns) have gained widespread popularity in different domain applications due to their dominant performance. Despite the prevalence of massively parallel multi-core processor architectures, adopti...
详细信息
Deep Neural Networks (dnns) have gained widespread popularity in different domain applications due to their dominant performance. Despite the prevalence of massively parallel multi-core processor architectures, adopting large dnnmodels in embedded systems remains challenging, as most embedded applications are designed with single-core processors in mind. This limits dnn adoption in embedded systems due to inefficient leveraging of model parallelization and workload partitioning. Prior solutions attempt to address these challenges using data and model parallelism. However, they lack in finding optimal dnnmodel partitions and distributing them efficiently to achieve improved performance. This paper proposes a dnnmodel parallelism framework to accelerate model training by finding the optimal number of model partitions and resource provisions. The proposed framework combines data and model parallelism techniques to optimize the parallel processing of dnns for embedded applications. In addition, it implements the pipeline execution of the partitioned models and integrates a task controller to manage the computing resources. The experimental results for image object detection demonstrate the applicability of our proposed framework in estimating the latest execution time and reducing overall model training time by almost 44.87% compared to the baseline AlexNet convolutional neural network (CNN) model.
暂无评论