Global trading is undergoing significant changes, necessitating modifications to the trading strategies. This study presents a newly developed cloud-based trading strategy that uses Amazon Web Services (AWS), machine ...
详细信息
For future Internet of Vehicles (IoV), communications and computing will converge to provide services. Federated learning (FL), as one of the typical distributed computing technologies, needs to be integrated with IoV...
详细信息
Unmanned Aerial Vehicles (UAVs) have been recently leveraged in massive amount of Internet of Things (IoT) applications. However, given the stringent limitations of UAVs, investigating their performance in terms of th...
详细信息
Given a directed graph G = (V, E) with n vertices, m edges and a designated source vertex s ∈ V, we consider the question of finding a sparse subgraph H of G that preserves the flow from s up to a given threshold λ ...
详细信息
This work presents an essential module for the Transfer Learning approach's classification of melanoma skin lesions. Melanoma, a highly lethal form of skin cancer, poses a significant health threat globally. Image...
详细信息
Considering the problems of the limited energy in wireless multi-media sensor networks (WMSNs) and the focused regions discontinuity of the fused image obtained using traditional multi-scale analysis tools (MST)-based...
详细信息
In Bangladesh, most four-legged intersections use a static-timed or manually-controlled traffic signaling approach, which creates irrefutable congestion and requires human involvement. This paper proposes an adaptive ...
详细信息
Fault diagnosis of rotating machinery driven by induction motors has received increasing attention. Current diagnostic methods, which can be performed on existing inverters or current transformers of three-phase induc...
详细信息
Currently, electricity demand is constantly increasing all over the world, and the demand for this electricity is much higher than the production. As a result, the whole world is facing a global problem. In this decad...
详细信息
Gradient compression is a promising approach to alleviating the communication bottleneck in data parallel deep neural network (DNN) training by significantly reducing the data volume of gradients for synchronization. ...
详细信息
Gradient compression is a promising approach to alleviating the communication bottleneck in data parallel deep neural network (DNN) training by significantly reducing the data volume of gradients for synchronization. While gradient compression is being actively adopted by the industry (e.g., Facebook and AWS), our study reveals that there are two critical but often overlooked challenges: 1) inefficient coordination between compression and communication during gradient synchronization incurs substantial overheads, and 2) developing, optimizing, and integrating gradient compression algorithms into DNN systems imposes heavy burdens on DNN practitioners, and ad-hoc compression implementations often yield surprisingly poor system performance. In this paper, we propose a compression-aware gradient synchronization architecture, CaSync, which relies on flexible composition of basic computing and communication primitives. It is general and compatible with any gradient compression algorithms and gradient synchronization strategies and enables high-performance computation-communication pipelining. We further introduce a gradient compression toolkit, CompLL, to enable efficient development and automated integration of on-GPU compression algorithms into DNN systems with little programming burden. Lastly, we build a compression-aware DNN training framework HiPress with CaSync and CompLL. HiPress is open-sourced and runs on mainstream DNN systems such as MXNet, TensorFlow, and PyTorch. Evaluation via a 16-node cluster with 128 NVIDIA V100 GPUs and a 100 Gbps network shows that HiPress improves the training speed over current compression-enabled systems (e.g., BytePS-onebit, Ring-DGC and PyTorch-PowerSGD) by 9.8%-69.5% across six popular DNN models. IEEE
暂无评论