When the data are stored in a distributed manner, direct applications of traditional statistical inference procedures are often prohibitive due to communication costs and privacy concerns. This article develops and in...
详细信息
When the data are stored in a distributed manner, direct applications of traditional statistical inference procedures are often prohibitive due to communication costs and privacy concerns. This article develops and investigates two communication-efficient accurate statistical estimators (CEASE), implemented through iterative algorithms for distributed optimization. In each iteration, node machines carry out computation in parallel and communicate with the central processor, which then broadcasts aggregated information to node machines for new updates. The algorithms adapt to the similarity among loss functions on node machines, and converge rapidly when each node machine has large enough sample size. Moreover, they do not require good initialization and enjoy linear converge guarantees under general conditions. The contraction rate of optimization errors is presented explicitly, with dependence on the local sample size unveiled. In addition, the improved statistical accuracy per iteration is derived. By regarding the proposed method as a multistep statistical estimator, we show that statistical efficiency can be achieved in finite steps in typical statistical applications. In addition, we give the conditions under which the one-step CEASE estimator is statistically efficient. Extensive numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the superior performance of our algorithms.
In this paper, we discuss several algorithms for scheduling divisible workloads on heterogeneous systems. Our main contributions are (i) new optimality results for single-roundalgorithms and (ii) the design of an asy...
详细信息
In this paper, we discuss several algorithms for scheduling divisible workloads on heterogeneous systems. Our main contributions are (i) new optimality results for single-roundalgorithms and (ii) the design of an asymptotically optimal multi-round algorithm. This multi-round algorithm automatically performs resource selection, a difficult task that was previously left to the user. Because it is periodic, it is simpler to implement, and more robust to changes in the speeds of the processors and/or communication links. On the theoretical side, to the best of our knowledge, this is the first published result assessing the absolute performance of a multi-round algorithm. On the practical side, extensive simulations reveal that our multi-round algorithm outperforms existing solutions on a large variety of platforms, especially when the communication-to-computation ratio is not very high (the difficult case). (C) 2003 Elsevier B.V. All rights reserved.
Numerous studies have been targeting the problem of scheduling divisible workloads in distributed computing environments. The UMR (Uniform multi-round) algorithm stands out from all others by being the first close-for...
详细信息
ISBN:
(纸本)9780889866386
Numerous studies have been targeting the problem of scheduling divisible workloads in distributed computing environments. The UMR (Uniform multi-round) algorithm stands out from all others by being the first close-form optimal scheduling algorithm. However, present algorithms, including the UMR, do not pay due attention to optimizing the set of workers that get selected to participate in processing workload chunks. In addition to the absence of a good resource selection policy, the UMR relies primarily in its computation on the CPU speed and overlooks the role of other key parameters such as network bandwidth. In this paper, we propose an extended version of UMR, called UMR2, that overcomes these limitations and adopts a worker selection policy that aims at minimizing the makespan. We, theoretically and experimentally, show that UMR2 is superior to UMR, specifically in a WAN computing platform such as the Grid.
In this paper, we discuss several algorithms for scheduling divisible workloads on heterogeneous systems. Our main contributions are (i) new optimality results for single-roundalgorithms and (ii) the design of an asy...
详细信息
In this paper, we discuss several algorithms for scheduling divisible workloads on heterogeneous systems. Our main contributions are (i) new optimality results for single-roundalgorithms and (ii) the design of an asymptotically optimal multi-round algorithm. This multi-round algorithm automatically performs resource selection, a difficult task that was previously left to the user. Because it is periodic, it is simpler to implement, and more robust to changes in the speeds of the processors and/or communication links. On the theoretical side, to the best of our knowledge, this is the first published result assessing the absolute performance of a multi-round algorithm. On the practical side, extensive simulations reveal that our multi-round algorithm outperforms existing solutions on a large variety of platforms, especially when the communication-to-computation ratio is not very high (the difficult case). (C) 2003 Elsevier B.V. All rights reserved.
暂无评论