distributed data-parallel (DDP) training improves overall application throughput as multiple devices train on a subset of data and aggregate updates to produce a globally shared model. The periodic synchronization at ...
详细信息
ISBN:
(纸本)9798350304817
distributed data-parallel (DDP) training improves overall application throughput as multiple devices train on a subset of data and aggregate updates to produce a globally shared model. The periodic synchronization at each iteration incurs considerable overhead, exacerbated by the increasing size and complexity of state-of-the-art neural networks. Although many gradient compression techniques propose to reduce communication cost, the ideal compression factor that leads to maximum speedup or minimum data exchange remains an open-ended problem since it varies with the quality of compression, model size and structure, hardware, network topology and bandwidth. We propose GraVAC, a framework to dynamically adjust compression factor throughout training by evaluating model progress and assessing gradient information loss associated with compression. GraVAC works in an online, black-box manner without any prior assumptions about a model or its hyperparameters, while achieving the same or better accuracy than dense SGD (i.e., no compression) in the same number of iterations/epochs. As opposed to using a static compression factor, GraVAC reduces end-to-end training time for ResNet101, VGG16 and LSTM by 4.32x, 1.95x and 6.67x respectively. Compared to other adaptive schemes, our framework provides 1.94x to 5.63x overall speedup.
With the rapid development of renewable energy, especially solar energy, distributed photovoltaic power plants have become a crucial component of energy transition. In order to improve the operational efficiency and r...
详细信息
In this paper, we present a novel unified framework that seamlessly integrates distributedcomputing and high-density graph computing. Our approach leverages a hybrid architecture that combines the strengths of both p...
详细信息
Cloud computing is a relatively mature business computing model, which is gradually developed from technologies such as distributedcomputing, parallel processing, and gridcomputing. Similarly, with the continuous em...
详细信息
This article examines directions and mechanisms for increasing data reliability in computer networks. Currently, the rapid development of information technologies, the rapid growth of data flow, high-quality data proc...
详细信息
Smartphones have made resource-intensive mobile applications indispensable in everyday life. To meet predetermined requirements and lighten the weight of such challenges brought on by the swift expansion of such servi...
详细信息
Serverless computing has shown vast potential for big data analytics applications, especially involving machine learning algorithms. Nevertheless, little consideration has been given in the literature to cloud-agnosti...
详细信息
The socioeconomic development and welfare of communities are significantly reliant on a reliable power supply. Power grid inspections pose significant risks due to hazardous environments, such as high-voltage structur...
详细信息
distributed photovoltaic systems are a recently developed form of distributed power generation that offer the benefits of being both environmentally friendly and highly *** power demand continues to rise and the impor...
详细信息
In a distributed network environment, multiple mutually independent trust domains are formed between different enterprises. During network access, access requests for resources may come from local trust domains or fro...
详细信息
暂无评论