In view of the performance and security problems existing after the replacement of specialized security protection equipment for power grids with domestic software and hardware, and in response to the security protect...
详细信息
Material Point Method(MPM) is widely used to simulate large deformation processes such as material fracture, collision, and fluid structure interaction. In general, an MPM simulation evolves a dynamic process in which...
详细信息
Data parallelism (DP), Tensor parallelism (TP), and Pipeline parallelism (PP) are the three strategies widely adopted to enable fast and efficient Large Language Model (LLM) training. However, these approaches rely on...
详细信息
ISBN:
(纸本)9798350395679;9798350395662
Data parallelism (DP), Tensor parallelism (TP), and Pipeline parallelism (PP) are the three strategies widely adopted to enable fast and efficient Large Language Model (LLM) training. However, these approaches rely on data-intensive communication routines to collect, aggregate, and re-distribute gradients, activations, and other important model information, which pose significant overhead. Co-designed with GPU-based compression libraries, MPI libraries have been proven to reduce message size significantly, and leverage interconnect bandwidth, thus increasing training efficiency while maintaining acceptable accuracy. In this work, we investigate the efficacy of compression-assisted MPI collectives under the context of distributed LLM training using 3D parallelism and ZeRO optimizations. We scaled up to 192 V100 GPUs on the Lassen supercomputer. First, we enabled a naive compression scheme across all collectives and observed a 22.5% increase in TFLOPS per GPU and a 23.6% increase in samples per second for GPT-NeoX-20B training. Nonetheless, such a strategy ignores the sparsity discrepancy among messages communicated in each parallelism degree, thus introducing more errors and causing degradation in training loss. Therefore, we incorporated hybrid compression settings toward each parallel dimension and adjusted the compression intensity accordingly. Given their low-rank structure [1], we apply aggressive compression on gradients when performing DP All-reduce. We adopt milder compression to preserve precision while communicating activations, optimizer states, and model parameters in TP and PP. Using the adjusted hybrid compression scheme, we demonstrate a 17.3% increase in TFLOPS per GPU and a 12.7% increase in samples per second while reaching baseline loss convergence. *
Big data is an important product of the information age. Integrating big data into smart grid applications and correctly grasping the key technologies of big data can effectively promote the sustainable development of...
详细信息
Big data is an important product of the information age. Integrating big data into smart grid applications and correctly grasping the key technologies of big data can effectively promote the sustainable development of power industry and the construction of strong smart grid. As far as modern smart grid is concerned, this is both an opportunity and a challenge. 3D point cloud data processing is the core content of reverse engineering technology. As an important data processing step in the preprocessing stage of 3D point cloud, point cloud registration plays an extremely important role in obtaining the complete 3Dcoordinates of the measured target surface. However, at present, the registration speed, accuracy and reliability of various registration algorithms still need to be improved. Cloud computing technology integrates several cheap ordinary PCs into a cloud computing cluster, which realizes the safe storage and efficient processing of massive data. Therefore, consider combining cloud computing with data mining algorithm to solve the problem of massive data conversion in smart grid. In this paper, cloud computing technology is introduced into the smart grid condition monitoring field. By introducing distributed file system, improving traditional density clustering algorithm and parallel design, the storage and clustering division of big data in condition monitoring are effectively solved, which provides a feasible method for the application of cloud computing in condition monitoring field.
As the core building block of blockchain technology, Byzantine Fault Tolerance (BFT) consensus has gathered renewed attention during the past decade. A large number of BFT consensus protocols have been proposed since ...
详细信息
In parallel with the continuously increasing parameter space dimensionality, search and optimization algorithms should support distributed parameter evaluations to reduce cumulative runtime. Intel's neuromorphic o...
详细信息
To address the problems of large computation and communication in the economic scheduling of multi-distributed power supplies and the difficulty of overall optimization, an economic scheduling method of distributed po...
详细信息
The optimization problem of resource allocation and task scheduling involves effectively utilizing limited computing resources and arranging user service requests or tasks in a distributedcomputing environment to ach...
详细信息
ISBN:
(纸本)9789819743896;9789819743902
The optimization problem of resource allocation and task scheduling involves effectively utilizing limited computing resources and arranging user service requests or tasks in a distributedcomputing environment to achieve improved performance, reduced costs, and energy savings. With the advancements in technologies such as the Internet of Things and 5G, edge computing has emerged as a new paradigm that deploys data processing and analysis capabilities closer to the data source, enabling faster, more secure, and reliable services. Consequently, the widespread adoption of edge computing presents new challenges and opportunities for resource allocation and task scheduling optimization, including the management of massive data streams, adaptation to dynamic network environments, and coordination between edge and cloud computing. This paper summarizes the issues, evaluation dimensions and methods of resource allocation and task scheduling optimization in edge computing and identifies future prospects and challenges in this area.
This paper presents an approach for power balancing in grid-linked inverters within the context of smart grids, addressing the challenge of unbalanced grid voltage conditions. The proposed method employs a paralleled ...
详细信息
As the Internet of Things (IoT) increasingly empowers the network extremes with in-place intelligence through Machine Learning (ML), energy consumption and carbon emissions become crucial factors. ML is often computat...
详细信息
ISBN:
(纸本)9783031488023;9783031488030
As the Internet of Things (IoT) increasingly empowers the network extremes with in-place intelligence through Machine Learning (ML), energy consumption and carbon emissions become crucial factors. ML is often computationally intensive, with state-of-the-art model architectures consuming significant energy per training round and imposing a large carbon footprint. This work, therefore, argues for the need to introduce novel mechanisms into the ML pipelines of IoT services, so that energy awareness is integrated in the decision-making process for when and where to initiate ML model training.
暂无评论