Communication scheduling has been shown to be effective in accelerating distributed training, which enables all-reduce communications to be overlapped with backpropagation computations. This has been commonly adopted ...
详细信息
ISBN:
(纸本)9798350339864
Communication scheduling has been shown to be effective in accelerating distributed training, which enables all-reduce communications to be overlapped with backpropagation computations. This has been commonly adopted in popular distributed deep learning frameworks. However, there exist two fundamental problems: (1) excessive startup latency proportional to the number of workers for each all-reduce operation;(2) it only achieves sub-optimal training performance due to the dependency and synchronization requirement of the feed-forward computation in the next iteration. We propose a novel scheduling algorithm, DeAR, that decouples the all-reduce primitive into two continuous operations, which overlaps with both backpropagation and feed-forward computations without extra communications. We further design a practical tensor fusion algorithm to improve the training performance. Experimental results with five popular models show that DeAR achieves up to 83% and 15% training speedup over the state-of-the-art solutions on a 64-GPU cluster with 10Gb/s Ethernet and 100Gb/s InfiniBand interconnects, respectively.
distributed learning has been widely adopted to train a global model from local data. However, its performance can be severely affected by stragglers. Recently, some research has been dedicated to resolving the stragg...
详细信息
ISBN:
(纸本)9798350339864
distributed learning has been widely adopted to train a global model from local data. However, its performance can be severely affected by stragglers. Recently, some research has been dedicated to resolving the straggler problem by adopting gradient coding, the essence of gradient coding is to solve the straggler problem by adding data redundancy. However, the large amount of data redundancy as well as computation and communication overhead that it brings is still hard to be resolved. Besides, the complexity of the encoding and decoding will increase linearly with the number of the local workers. To this end, in this paper, we design a lightweight coding method in the computing phase and seek to ensure fair transmission in the communication phase. Specifically, to tolerate stragglers in computing phase, we propose a two-stage dynamic coding scheme, part of the workers start computing the partial gradients from the data partitions assigned in the first stage, and the remaining workers for computation in the second stage is decided based on which workers have finished in the first stage. To further tolerate stragglers in the communication phase, a perturbed Lyapunov function is designed to maximize admission data balancing fairness as well as the throughput. The experimental result verifies the derived properties and demonstrates that our proposed solution can achieve a better performance for practical network parameters and benchmark data in terms of accuracy and resource utilization in the distributed learning system.
In Synchronous Reluctance Machines (SynRM), achieving a higher saliency ratio and lower torque ripple is important. While Fractional Slot Concentrated Winding (FSCW) machines have better efficiency because of shorter ...
详细信息
ISBN:
(纸本)9798350385939;9798350385922
In Synchronous Reluctance Machines (SynRM), achieving a higher saliency ratio and lower torque ripple is important. While Fractional Slot Concentrated Winding (FSCW) machines have better efficiency because of shorter winding overhang, their saliency is significantly reduced due to the presence of undesirable magnetomotive force (MMF) space harmonics. distributed windings (DW), on the other hand, have lesser torque ripple and higher saliency, but they suffer from lower efficiency because of their higher winding overhang. This paper presents methods of improving saliency by modifying the winding layout of FSCW which helps to reduce or eliminate certain undesired MMF harmonics, while achieving shorter overhang than distributed winding. In addition to saliency, various performance metrics including voltage distortion, efficiency, power factor, and torque ripple have been examined and compared with those of FSCW and distributed winding configurations.
This paper presents work in progress towards a system modeling and co-emulation framework for distributed cyber-physical system (CPS) environments. The proposed framework aims to support experiential learning and expe...
详细信息
ISBN:
(纸本)9798350322811
This paper presents work in progress towards a system modeling and co-emulation framework for distributed cyber-physical system (CPS) environments. The proposed framework aims to support experiential learning and experiment orchestration in environments such as CPS testbeds and chemistry labs. It addresses challenges of interoperability, multi-tenancy, scalability and security by leveraging a novel "co-emulation" approach that combines different modeling, orchestration and runtime tools.
distributed persistent key-value store (KVS) plays an important role in today's storage infrastructure. The development of persistent memory (PM) and remote direct memory access (RDMA) allows to build distributed ...
详细信息
ISBN:
(纸本)9798350339864
distributed persistent key-value store (KVS) plays an important role in today's storage infrastructure. The development of persistent memory (PM) and remote direct memory access (RDMA) allows to build distributed persistent KVS to provide fast data access. However, prior works focus on either PM-oriented or RDMA-oriented optimizations for key-value stores. We find these optimizations disallow a simple porting of RDMA-enabled KVS to PM or vice versa. This paper proposes FastStore, a high-performance distributed persistent KVS, by fully exploiting RDMA features and PM-friendly optimizations. First, FastStore utilizes RDMA-enabled PM exposure to establish direct indexing at the client side to reduce RTTs for reading values. Meanwhile, PM exposure allows PM sharing among cluster nodes, which helps to mitigate attribute-value skewness. Then, FastStore designs PM-friendly ownership transferring log and failure-atomic slotted-page allocator to achieve highly efficient PM management without PM leakage. Finally, FastStore proposes volatile search key to its B+tree indexing to reduce excessive PM accesses. We implement FastStore and the evaluation shows that FastStore outperforms the state-of-the-art ordered KVS Sherman by 2.8x higher throughput and 71.5% fewer RTTs.
The proceedings contain 39 papers. The topics discussed include: blockchain-based user-centric electronic health record management system;virtual mouse using hand gesture recognition - a systematic literature review;f...
ISBN:
(纸本)9781665428323
The proceedings contain 39 papers. The topics discussed include: blockchain-based user-centric electronic health record management system;virtual mouse using hand gesture recognition - a systematic literature review;face mask detection under the threat of Covid-19 virus;fusing clustering and machine learning techniques for big-mart sales predication;cognitive intelligence of a cloud-based internet of things in precision agriculture applications;a distributed agent-oriented framework for blockchain-enabled supply chain management;internet of things based controlled environment for the production of shiitake mushroom;deepfake video detection using neural networks;online storage, retrieval & authentication of healthcare documents using Ethereum blockchain;transparency in carbon credit by automating data-management using blockchain;performance evaluation of proof of scope consensus mechanisms on Hyperledger;face mask detection and recognition system;blockchain-based user-centric electronic health record management system;and integrating comparison of malware detection classification using LGBM and XGB machine learning algorithms.
In this paper, we explore the field of self-reconfigurable modular robots, representing a significant advance in robotic technology. These robots have many capabilities, offering high adaptability and flexibility for ...
详细信息
ISBN:
(纸本)9798350377712;9798350377705
In this paper, we explore the field of self-reconfigurable modular robots, representing a significant advance in robotic technology. These robots have many capabilities, offering high adaptability and flexibility for a variety of applications. However, computing the stability is challenging as it is computationally intensive, it needs to be distributed and fast, as close as possible of real-time. In this article, we introduce a distributed algorithm designed to overcome these challenges while taking mechanical constraints into account. At the heart of this algorithm is the notion of the "support polygon", which enables the stability of a modular robot to be assessed in real time. The algorithm is based on a fully distributed tree partitioning approach, facilitating efficient communication and collaboration between modules. The algorithm also uses a polygon merging approach to reduce the number of messages when creating the polygon support, thus significantly reducing response time. In fact, the response time of the method used is very small compared to other research. We also present simulation results on a simulator, VisibleSim, as well as experimental validation on real robotic modules, which underlines the practical viability of the approach. Overall, this work lays a solid base for further advances aiming to guarantee the stability of modular robots.
Modern manufacturing systems characterize for the multiple dimensions of their complexity. They are numerically complex, as they consist of several components. They are logically complex, as multiple and variegated li...
详细信息
ISBN:
(纸本)9798350322811
Modern manufacturing systems characterize for the multiple dimensions of their complexity. They are numerically complex, as they consist of several components. They are logically complex, as multiple and variegated links exist among the different components. They are technologically complex, as a mix of different hardware and software technologies and architectures is typically found. They are geographically complex, as they often extend across multiple physical locations and sometimes involve multiple organizations. However, resilience to predictable and unpredictable events through timely, efficient, and effective reconfiguration of the whole manufacturing ecosystem remains a key objective, being it a key enabler of industry competitiveness. In this work, an innovative approach based on API request collections, containerization technologies, and past research about remotely reconfigurable distributedsystems, is proposed for achieving ultimate resilience in modern industry.
Standard distribution middleware has traditionally been perceived as complex software which is not suitable for satisfying the highest certification criteria in safety-critical environments. However, this idea is slow...
详细信息
ISBN:
(纸本)9798350387964;9798350387957
Standard distribution middleware has traditionally been perceived as complex software which is not suitable for satisfying the highest certification criteria in safety-critical environments. However, this idea is slowly changing and there are efforts such as the Future Airborne Capability Environment (FACE) consortium to integrate standard distribution middleware into the development of avionic systems. This integration facilitates the interoperability and portability of avionic applications, but there are still challenges that need to be addressed before full success can be achieved. To this end, this paper explores the usage of the Data Distribution Service for Real-Time systems (DDS) on top of a partitioned system with a communication network based on the ARINC 664 specification (precisely, the AFDX network). This work specifically identifies the incompatibilities between the two standards and also proposes potential solutions. A set of overhead metrics of using DDS in a distributed partitioned platform is also provided.
computing power network (CPN) is a distributed network system designed to connect and integrate computing resources globally, enabling efficient sharing and utilization of computing power. In dependent task offloading...
详细信息
ISBN:
(纸本)9798350363999;9798350364002
computing power network (CPN) is a distributed network system designed to connect and integrate computing resources globally, enabling efficient sharing and utilization of computing power. In dependent task offloading, the dependency relationship between tasks is generally used to determine the execution order. However, the assignment phase often overlooks the relevance and sharing between tasks, leading to a waste of system resources in CPNs. In the learning process, agents frequently encounter the issue of sparse rewards, which results in slow learning and makes it challenging to develop effective strategies. To address the aforementioned issues, we design a dependent task offloading method based on hypergraph partitioning and an intrinsic curiosity module, i.e., HP-ICM, which offloads tasks with similar resource requirements or dependencies into the same partition and utilizes the ICM to enhance the speed and quality of learning. Simulation results show that HP-ICM can reduce latency by 22.8% and energy consumption by 25.7% compared to the PPO baseline during task offloading.
暂无评论