This paper considers the problem of extending the battery lifetime for a portable computer by offloading its computation to a server Depending on the inputs, computation time for different instances of a program can v...
详细信息
ISBN:
(纸本)9781424418893
This paper considers the problem of extending the battery lifetime for a portable computer by offloading its computation to a server Depending on the inputs, computation time for different instances of a program can vary significantly and they are often difficult to predict. Different from previous studies on computation offloading, our approach does not require estimating the computation time before the execution. We execute the program initially on the portable client with a timeout. If the computation is not completed after the timeout, it is offloaded to the server We first set the timeout to be the minimum computation time that can benefit from offloading. This method is proved to be 2-competitive. We further consider collecting online statistics of the computation time and find the statistically optimal timeout. Finally, we provide guidelines to construct programs with computation offloading. Experiments show that our methods can save up to 17% more energy than existing approaches.
To reduce the aggravated communication overhead of the computation offloading in the mobile cloud computing network, a novel scheme of optimal resource allocation was proposed to minimize the total completion time of ...
详细信息
To reduce the aggravated communication overhead of the computation offloading in the mobile cloud computing network, a novel scheme of optimal resource allocation was proposed to minimize the total completion time of remote tasks offloaded by multiple mobile terminals with the constraint of that the task completion time of each remote task is less than a preset threshold. Firstly, the optimization problem of resource allocation for multiple mobile terminals was formulated. Then, an algorithm was presented in detail to test the feasibility of satisfying the task completion time constraint for each remote task simultaneously. Moreover, an efficient scheme was developed to search the optimal solution of the resource allocation problem. Conducted simulation results show the validity of the proposed optimal resource allocation scheme.
Time series-based prediction methods have a wide range of uses in embedded systems. Many OS algorithms and applications require accurate prediction of demand and supply of resources. However, configuring prediction al...
详细信息
Time series-based prediction methods have a wide range of uses in embedded systems. Many OS algorithms and applications require accurate prediction of demand and supply of resources. However, configuring prediction algorithms is not easy, since the dynamics of the underlying data requires continuous observation of the prediction error and dynamic adaptation of the parameters to achieve high accuracy. Current prediction methods are either too costly to implement on resource-constrained devices or their parameterization is static, making them inappropriate and inaccurate for a wide range of datasets. This paper presents NWSLite, a prediction utility that addresses these shortcomings on resource-restricted platforms.
Many programs can be invoked under different execution options;input parameters and data files. Such different execution contexts may lead to strikingly different execution instances. The optimal code generation may b...
详细信息
Many programs can be invoked under different execution options;input parameters and data files. Such different execution contexts may lead to strikingly different execution instances. The optimal code generation may be sensitive to the execution instances. In this paper, we show how to use parametric program analysis to deal with this issue for the optimization problem of computation offloading. computation offloading has been shown to be an effective way to improve performance and energy saving on mobile devices. Optimal program partitioning for computation offloading depends on the tradeoff between the computation workload and the communication cost. The computation workload and communication requirement may change with different execution instances. Optimal decisions on program partitioning must be made at run time when sufficient information about workload and communication requirement becomes available. Our cost analysis obtains program computation workload and communication cost expressed as functions of run-time parameters, and our parametric partitioning algorithm finds the optimal program partitioning corresponding to different ranges of run-time parameters. At run time, the transformed program self-schedules its tasks on either the mobile device or the server, based on the optimal program partitioning that corresponds to the current values of run-time parameters. Experimental results on an HP IPAQ handheld device show that different run-time parameters can lead to quite different program partitioning decisions.
In this paper we present a framework that acts as a distributed media encoder/decoder for real-time multimedia streams. The paper proposes an implementation of a Multimedia encoder/decoder that works by partitioning a...
详细信息
ISBN:
(纸本)9781402062650
In this paper we present a framework that acts as a distributed media encoder/decoder for real-time multimedia streams. The paper proposes an implementation of a Multimedia encoder/decoder that works by partitioning and distributing various tasks allocated to different stages of the encoder/decoder to different computers having the minimum required capabilities for that task. At the end the combined work by these different nodes creates the actual encoded/decoded multimedia stream. As encoding is a resource hungry process, we divide it into separable stages and perform their tasks on multiple nodes, while decoding is performed on the single intended target device if it is capable to do so. In case of less capable target device, the Middleware can convert the encoded video into a format suitable for the client node.
We present a systematic methodology for exploring the security processing software architecture for a commercial heterogeneous multiprocessor system-on-chip (SoC) for mobile devices. The SoC contains multiple host pro...
详细信息
ISBN:
(纸本)1595933816
We present a systematic methodology for exploring the security processing software architecture for a commercial heterogeneous multiprocessor system-on-chip (SoC) for mobile devices. The SoC contains multiple host processors executing applications and a dedicated programmable security processing engine. We developed an exploration methodology to map the code and data of security software libraries onto the platform, with the objective of maximizing the overall application-visible performance. The salient features of the methodology include (i) the use of real performance measurements from a prototyping board that contains the target platform to drive the exploration, (ii) a new data structure access profiling framework that allows us to accurately model the communication overheads involved in offloading a given set of functions to the security processor, and (iii) an exact branch-and-bound based design space exploration algorithm that determines the best mapping of security library functions and data structures to the host and security processors. We used the proposed framework to map a commercial security library to the target mobile application SoC. The resulting optimized software architecture outperformed several manually-designed software architectures, resulting in upto 12.5X speedup for individual cryptographic operations (encryption, hashing) and 2.2X-6.2X speedup for applications such as a Digital Rights Management (DRM) agent and Secure Sockets Layer (SSL) client. We also demonstrate the applicability of our framework to software architecture exploration in other multiprocessor scenarios.
We consider handheld computing devices which are connected to a server (or a powerful desktop machine) via a wireless LAN. On such devices, it is often possible to save the energyon the handheld byoffloading its compu...
详细信息
ISBN:
(纸本)1581133995
We consider handheld computing devices which are connected to a server (or a powerful desktop machine) via a wireless LAN. On such devices, it is often possible to save the energyon the handheld byoffloading its computation to the server. In this work, based on profiling information on computation time and data sharing at the filevel of procedure calls, we construct a cost graph for a given application program. We then applya partition scheme to statically divide the program into server tasks and client tasks such that the energyconsumed bythe program is minimized. Experiments are performed on a suite of multimedia benchmarks. Results show considerable energysa ving for several programs through offloading. Copyright 2001 ACM.
As a distributed embedded system, vehicular edge computing (VEC) completes various complex Deep neural network (DNN) tasks through network collaboration and communication. However,due to the limited computing power of...
详细信息
As a distributed embedded system, vehicular edge computing (VEC) completes various complex Deep neural network (DNN) tasks through network collaboration and communication. However,due to the limited computing power of vehicle processors, vehicles cannot handle increasingly complex DNN tasks. To accurately estimate the execution latency of each layer across different DNN models on heterogeneous devices, we proposed the Extreme Gradient Boosting Tree (XGBoost) algorithm to predict DNN task inference latency. Furthermore, we proposed partitioning and offloading algorithms for both chained DNN tasks and Directed Acyclic Graph (DAG)-type DNN tasks, addressing their unique computational characteristics. For chained DNN tasks, we employ a linear search to determine optimal partitioning points based on predictions from the DNN latency prediction model. For the partitioning and offloading of DAG-type DNN tasks, we construct it as a minimum cut problem under the network flow graph and propose a DNN task partitioning and offloading algorithm based on the highest label pre-stream push (HLPP) algorithm to effectively reduce the cost of task partitioning and offloading. Finally, we used an experimental vehicle equipped with Raspberry and a RSU equipped with Jetson Nano to verify the results. The experiment shows that the DNN latency prediction model based on the XGBoost we proposed can effectively improve the latency prediction accuracy of DNN layer-by-layer execution. At the same time, the division and offloading algorithms for different types of DNN inference tasks can achieve higher task completion rate, lower latency, and lower energy consumption.
暂无评论