Modern Machine Learning (ML) training on large-scale datasets is a very time-consuming workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to its effectiveness, simplicity, and gene...
详细信息
ISBN:
(数字)9798400706318
ISBN:
(纸本)9798400706318
Modern Machine Learning (ML) training on large-scale datasets is a very time-consuming workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance (i.e., test performance on unseen data). Processor-centric architectures (e.g., CPUs, GPUs) commonly used for modern ML training workloads based on SGD are bottlenecked by data movement between the processor and memory units due to the poor data locality in accessing large training datasets. As a result, processor-centric architectures suffer from low performance and high energy consumption while executing ML training workloads. Processing-In-Memory (PIM) is a promising solution to alleviate the data movement bottleneck by placing the computation mechanisms inside or near memory. Several prior works propose PIM techniques to accelerate ML training;however, prior works either do not consider real-world PIM systems or evaluate algorithms that are not widely used in modern ML training. Our goal is to understand the capabilities and characteristics of popular distributed SGD algorithms on real-world PIM systems to accelerate data-intensive ML training workloads. To this end, we 1) implement several representative centralized parallel SGD algorithms, i.e., based on a central node responsible for synchronization and orchestration, on the real-world general-purpose UPMEM PIM system, 2) rigorously evaluate these algorithms for ML training on large-scale datasets in terms of performance, accuracy, and scalability, 3) compare to conventional CPU and GPU baselines, and 4) discuss implications for future PIM hardware. We highlight the need for a shift to an algorithm-hardware codesign to enable decentralized parallel SGD algorithms in real-world PIM systems, which significantly reduces the communication cost and improves scalability. Our results demonstrate three major findings: 1) The general-purpose UPMEM PIM system can be a viable alternat
The optimization-based energy management strategy (EMS) enables expertise to improve the performance of fuel cell vehicles (FCVs). Ongoing efforts are mostly focused on optimizing a centralized EMS using a variety of ...
详细信息
The optimization-based energy management strategy (EMS) enables expertise to improve the performance of fuel cell vehicles (FCVs). Ongoing efforts are mostly focused on optimizing a centralized EMS using a variety of high-computing technologies without offering appropriate scalability and modularity for the onboard powertrain components. In real-time applications, the time-accomplishment capability of EMSs is crucial;hence, decentralized EMSs with low-cost components and limited processing capability are necessary. Local units handle the computation load on a modular platform. In addition, the decentralized system's plug-and-play functionality minimizes the total cost. This paper presents a decentralized model predictive control (D-MPC) based on the consensus-based alternating direction method of multipliers (C-ADMM) that explicitly considers the coordination of the dynamic reactions of powertrain components and future driving profiles. In addition, a decentralized learning method is proposed to seek the optimal policy for the moving horizon dimensions in the D-MPC using the federated reinforcement learning (FRL) algorithm in order to improve processing time. Due to the deployment of a fully modular system in the proposed learning technique, agents are restricted from sharing their trajectories. Using a highly dynamic module-to-module communication layer in a fully decentralized arrangement, the powertrain components utilize the multi-step method to attain the global optimum. The performance of the proposed framework is evaluated with regards to its precision, convergence speed, and scalability. The results of numerical simulation and implementation demonstrated that the proposed method is superior to the centralized and fixed-horizon MPC approaches.
The distributed alternating direction method of multipliers (ADMM) is an efficient distributedoptimization algorithm, which however shows poor convergence in time-varying network topologies. To solve the challenge, w...
详细信息
ISBN:
(纸本)9781665454681
The distributed alternating direction method of multipliers (ADMM) is an efficient distributedoptimization algorithm, which however shows poor convergence in time-varying network topologies. To solve the challenge, we propose TV-ADMM, a novel distributed ADMM algorithm for time-varying communication networks. More specifically, importance weight parameters are introduced in message fusion, with the purpose of mitigating the potential error brought by the network topology dynamics. Based on that, the updating rules are designed with the first-order approximation and a Bregman divergence term, which can reduce the variance caused by the randomness and enhance the robustness. Moreover, we consider two different practical scenarios with time-varying communication network. In Scenario One, the communication between two nodes succeeds with certain probabilities, based on which the importance weight parameters are designed. Scenario Two considers mobile agents, where the communication link is determined by the distance between two agents. We derive the connectivity probability in this scenario and get the corresponding importance weight. Numerical simulations validate the effectiveness of the proposed algorithm in both scenarios, in comparison with the subgradient-based method.
We focus on the problem of deploying unmanned aerial vehicles to service mobile users in a cellular network, with the aim of maximizing coverage and reducing interference effects. An optimization model that is distrib...
详细信息
We focus on the problem of deploying unmanned aerial vehicles to service mobile users in a cellular network, with the aim of maximizing coverage and reducing interference effects. An optimization model that is distributed in nature is proposed and a maximizing algorithm is developed to find a locally optimal solution. The performance of this distributed algorithm is shown to be superior in quality and solution time to a standard greedy algorithm. Testing on a simulation of a practical scenario is performed to demonstrate the application of the method to real scenarios as well as to illustrate the tradeoff between maximizing coverage and minimizing interference.
This paper proposes three different distribution strategies for very large 3D image deconvolution algorithms. The deconvolution problem is generic and tailored for spatio-spectral 3D image reconstruction. The three pr...
详细信息
This paper proposes three different distribution strategies for very large 3D image deconvolution algorithms. The deconvolution problem is generic and tailored for spatio-spectral 3D image reconstruction. The three proposed algorithms for large-scale data are distributed in the sense that both the storage and the computations are distributed over several compute nodes. As a result, the workload is drastically reduced compared to a centralized approach where the storage and the computations are handled by a single compute node. The proposed algorithms are validated through experiments on simulated astronomical images.
We formulate a network security problem as a zero-sum game between an attacker who tries to disrupt a network by disabling one or more nodes, and the nodes of the network who must allocate limited resources to defend ...
详细信息
We formulate a network security problem as a zero-sum game between an attacker who tries to disrupt a network by disabling one or more nodes, and the nodes of the network who must allocate limited resources to defend the network. The utility of the zero-sum game can be one of several network performance metrics that correspond to node centrality measures. We first present a fast centralized algorithm that uses a monotone property of the utility function to compute saddle-point equilibrium strategies for the case of single-node attacks and single-or multiple-node defense. We then extend the approach to the distributed setting by computing the necessary quantities using a finite-time distributed averaging algorithm. For simultaneous attacks to multiple nodes, the computational complexity grows quickly, so we propose a method to approximate the saddle-point equilibrium strategies based on sequential simplification, which performs well in simulations.
In this paper, the problem of distributed resource allocation optimization is investigated for continuous-time multi-agent systems with discrete-time communication. A gradient-based continuous-time algorithm is propos...
详细信息
ISBN:
(纸本)9781509067817
In this paper, the problem of distributed resource allocation optimization is investigated for continuous-time multi-agent systems with discrete-time communication. A gradient-based continuous-time algorithm is proposed to solve this network resource allocation problem. A sufficient condition on the communication period is given to show that the proposed algorithm can achieve the exact optimization with exponential convergence rate. Finally, an example of economic dispatch in power grids is given to illustrate the effectiveness of the presented algorithm.
In this paper, the problem of distributed resource allocation optimization is investigated for continuoustime multi-agent systems with discrete-time communication. A gradient-based continuous-time algorithm is propose...
详细信息
In this paper, the problem of distributed resource allocation optimization is investigated for continuoustime multi-agent systems with discrete-time communication. A gradient-based continuous-time algorithm is proposed to solve this network resource allocation problem. A sufficient condition on the communication period is given to show that the proposed algorithm can achieve the exact optimization with exponential convergence rate. Finally, an example of economic dispatch in power grids is given to illustrate the effectiveness of the presented algorithm.
暂无评论