Millions of matrix dimensions in matrix multiplication will have high requirements on node computing power and storage space. coded distributed computing (CDC) can solve this problem by dividing large-dimensional matr...
详细信息
ISBN:
(数字)9781665469500
ISBN:
(纸本)9781665469500
Millions of matrix dimensions in matrix multiplication will have high requirements on node computing power and storage space. coded distributed computing (CDC) can solve this problem by dividing large-dimensional matrices into small matrices and then assigning them to machines in the computing cluster to perform matrix multiplication in parallel. In order to adapt to the reality that computer clusters are usually composed of heterogeneous workers with different computing capabilities, and overcome the performance limitations of CDC based on the isomorphism of computing power, coded Elastic computing (CEC) is proposed. However, the existing CEC discards the received information and directly starts a new round of computation after an elastic event occurs, resulting waste of computing time and resources. In this paper, we propose to employ the received information to redesign the allocation scheme. We first determine the offline machine number and the data segment it should have returned as the missing part of decoding that needs to be recomputed. We then count the total number of lost data for each segment of data and calculate the amount of tasks that each machine should undertake. Finally, the amount of tasks actually undertaken by each machine is calculated by solving the system of linear equations. Through experiments, we show the effectiveness of our proposed allocation scheme, in terms of saving resources and time, and accelerating the calculation speed, when compared with the original scheme.
Unmanned aerial vehicles (UAVs) have been widely used in wireless edge networks for task offloading, with the advantages of their agile management and high-flexibility deployment. However, due to limited computation c...
详细信息
ISBN:
(纸本)9781728195056
Unmanned aerial vehicles (UAVs) have been widely used in wireless edge networks for task offloading, with the advantages of their agile management and high-flexibility deployment. However, due to limited computation capability and restricted battery life, processing computation-intensive tasks on board may cause the excessive cost of latency and energy. In this paper, we propose a novel framework with coded distributed computing (CDC) for the task offloading from multi-UAV to ground edge servers, which can save transmitting and flying energy consumption in the air, and reduce computation latency in the terrestrial distributed server networks with stragglers. Specifically, we formulate a latency-energy cost minimization problem, to obtain the optimal the UAVs' trajectory schedule and the appropriate CDC's parameters. Moreover, we divide this problem into two sub-optimization problems, which are solved by a cost optimal trajectory schedule (COTS) algorithm and a cost optimal code parameter design (COCPD) algorithm, respectively. Finally, numerical results indicate the feasibility and the effectiveness of our proposed framework, which also validate that CDC can significantly reduce the cost in the UAV edge computing network.
This paper aims to develop a highly-effective framework to significantly enhance the efficiency in using codedcomputing techniques for distributedcomputing tasks over heterogeneous wireless edge networks. In particu...
详细信息
ISBN:
(纸本)9781665442664
This paper aims to develop a highly-effective framework to significantly enhance the efficiency in using codedcomputing techniques for distributedcomputing tasks over heterogeneous wireless edge networks. In particular, we first formulate a joint coding and node selection optimization problem to minimize the expected total processing time for computing tasks, taking into account the heterogeneity in the nodes' computing resources and communication links. The problem is shown to be NP-hard. To circumvent it, we leverage the unique characteristic of the problem to develop a linearization approach and a hybrid algorithm based on binary search and branch-and-bound (BB) algorithms. This hybrid algorithm can not only guarantee to find the optimal solution, but also significantly reduce the computational complexity of the BB algorithm. Simulations based on real-world datasets show that the proposed approach can reduce the total processing time up to 2.4 times compared with that of state-of-the-art approach, even without perfect knowledge regarding the node's performance and their straggling parameters.
The development of smart vehicles and rich cloud services have led to the emergence of vehicular edge computing. To perform the distributed computation tasks efficiently, coded distributed computing (CDC) was proposed...
详细信息
ISBN:
(纸本)9789897585296
The development of smart vehicles and rich cloud services have led to the emergence of vehicular edge computing. To perform the distributed computation tasks efficiently, coded distributed computing (CDC) was proposed to reduce communication costs and mitigate the straggler effects through the use of coding techniques. In this paper, we propose a double auction mechanism to allocate the resources of the edge servers to the vehicles in order to complete the CDC tasks. Specifically, the vehicles use the PolyDot codes to manage the tradeoff between communication costs and recovery threshold. Given the requirements of various vehicles, the double auction mechanism matches the edge servers with the required resources to the vehicles. Besides, the double auction mechanism also determines the prices that the vehicles need to pay for the resources of the edge servers. The double auction mechanism satisfies the properties of individual rationality, incentive compatibility and budget-balance.
In this paper, we introduce the DRJLRA algorithm, a load and resource allocation scheme based on deep reinforcement learning (DRL) for a generic multi-master, multiworker coded distributed computing (CDC) system. Our ...
详细信息
ISBN:
(纸本)9781665464833
In this paper, we introduce the DRJLRA algorithm, a load and resource allocation scheme based on deep reinforcement learning (DRL) for a generic multi-master, multiworker coded distributed computing (CDC) system. Our aim is to minimize the combined delay of communication and computation for a set of matrix-vector multiplication tasks. The proposed DRL-based approach has several unique features that set it apart from existing literature. Firstly, it is applicable to general CDC systems with multiple masters and workers. Additionally, it considers multi-task CDC systems with stochastic task arrivals, takes into account the heterogeneity of workers with random computation and communication delays, and utilizes the state-of-the-art soft actor-critic (SAC) DRL algorithm, making it versatile and efficient in handling complex and dynamic CDC environments. Our results demonstrate that DRJLRA outperforms benchmark schemes significantly. It is thus well-suited for real-world CDC systems with diverse and dynamic workloads.
This paper considers a distributedcomputing system where nodes are grouped such that nodes in the same group compute the same function. To complete a computing task distributedly, nodes need to exchange their local c...
详细信息
ISBN:
(数字)9781665483414
ISBN:
(纸本)9781665483414
This paper considers a distributedcomputing system where nodes are grouped such that nodes in the same group compute the same function. To complete a computing task distributedly, nodes need to exchange their local computation results with each other, which incurs communication cost and security concerns. The objective of this work is to study the tradeoff between computation and communication and avoid exchanged information leakage to eavesdroppers. Given a fixed computation load, we derive lower bounds on the communication load for 1-group and 2-group systems. New coding schemes are proposed and shown to be weakly secure and achieve the optimal tradeoffs for 1-group systems and for 2-group systems with large computation load.
Li et al. proposed coded distributed computing (CDC) to reduce the communication load between servers by increasing the computational load of each server. They have shown that this scheme achieves the fundamental trad...
详细信息
ISBN:
(纸本)9781450397773
Li et al. proposed coded distributed computing (CDC) to reduce the communication load between servers by increasing the computational load of each server. They have shown that this scheme achieves the fundamental trade-off between computational and communication load. However, with the increase of the number of servers in this scheme, the output function and input file also increase exponentially. In this paper, we propose a new scheme to solve this problem. We show that when the number of servers increases, 1) the number of output functions of the proposed scheme is much smaller than that of Li et al., and the number of output functions required decreases exponentially; 2) the ratio of the communication load of our new scheme to that of Li et al. is less than 1.9981.
Li et al. introduced coded distributed computing (CDC) scheme to reduce the communication load in general distributedcomputing frameworks such as MapReduce. They also proposed cascaded CDC schemes where each output f...
详细信息
Li et al. introduced coded distributed computing (CDC) scheme to reduce the communication load in general distributedcomputing frameworks such as MapReduce. They also proposed cascaded CDC schemes where each output function is computed multiple times, and proved that such schemes achieved the fundamental trade-off between computation load and communication load. However, these schemes require exponentially large numbers of input files and output functions when the number of computing nodes gets large. In this paper, by using the structure of placement delivery arrays (PDAs), we construct several infinite classes of cascaded CDC schemes. We also show that the numbers of output functions in all the new schemes are only a factor of the number of computing nodes, and the number of input files in our new schemes is much smaller than that of input files in CDC schemes derived by Li et al.
The Metaverse is recognized as the next-generation Internet that provides immersive interaction experiences for users. Convolutional neural networks (CNNs) play a crucial role in providing strong immersive experiences...
详细信息
The Metaverse is recognized as the next-generation Internet that provides immersive interaction experiences for users. Convolutional neural networks (CNNs) play a crucial role in providing strong immersive experiences in the Metaverse. However, the Metaverse faces challenges in meeting the escalating demands for computing and storage resources due to the explosive growth of convolution tasks, resulting in severe performance degradation. To tackle these issues, coded distributed computing (CDC) is commonly employed. In this paper, we first propose an efficient and reliable mobile-assisted CDC framework to perform large-scale CNN training tasks for the Metaverse. In this framework, the various mobile devices act as workers contributing their resources to collaborate with each other to complete convolution operation tasks. Furthermore, we design a novel resilient, secure, and private coded convolution (RSPCC) scheme for the proposed framework. The RSPCC scheme achieves several significant performances. First, it substantially reduces computation latency compared to conventional convolution. Second, it efficiently mitigates an adverse impact of straggling workers returning results exceedingly slow. Third, we integrate a verifiable computing approach into the encoding/decoding process to check the correctness of the final computation results. Fourth, the PSPCC scheme considers the existence of colluding workers, providing information-theoretic privacy protection for input data. Finally, experimental results demonstrate that our proposed RSPCC scheme can significantly reduce execution time while ensuring the correctness of computation results within the CDC-based Metaverse framework.
In distributedcomputing systems, to mitigate the adverse effect of stragglers on the computation time, computation redundancy is used. The redundancy can be added proactively at the beginning, or reactively after som...
详细信息
In distributedcomputing systems, to mitigate the adverse effect of stragglers on the computation time, computation redundancy is used. The redundancy can be added proactively at the beginning, or reactively after some time based on the delay pattern of the workers. While most of the existing work with reactive mitigation strategy only considered task replication, we propose a coded reactive straggler mitigation with an uncoded and a coded phase for distributed matrix-matrix multiplications. Specifically, in the uncoded phase of the proposed strategy, the master distributes the computational job without redundancy among the workers. After a predetermined waiting time, the master cancels the remaining tasks. It then encodes the remaining tasks and distributes them among the workers. In the uncoded phase, in addition to the conventional erasure model, where workers can communicate only once, we consider multi-message communication (MMC) model to exploit the partial works done by workers. The optimum waiting time for the uncoded phase and the optimum code rate for the coded phase are also obtained. Our simulation results demonstrate that the proposed coded reactive mitigation significantly decreases the execution time in comparison with both the proactive mitigation strategy or the existing reactive mitigation strategy.
暂无评论