In distributed training,increasing batch size can improve parallelism,but it can also bring many difficulties to the training process and cause training *** this work,we investigate the occurrence of training errors i...
详细信息
In distributed training,increasing batch size can improve parallelism,but it can also bring many difficulties to the training process and cause training *** this work,we investigate the occurrence of training errors in theory and train ResNet-50 on CIFAR-10 by using Stochastic Gradient Descent(SGD) and Adaptive moment estimation(Adam) while keeping the total batch size in the parameter server constant and lowering the batch size on each Graphics processing Unit(GPU).A new method that considers momentum to eliminate training errors in distributed training is *** define a Momentum-like Factor(MF) to represent the influence of former gradients on parameter updates in each ***,we modify the MF values and conduct experiments to explore how different MF values influence the training performance based on SGD,Adam,and Nesterov accelerated *** results reveal that increasing MFs is a reliable method for reducing training errors in distributed *** analysis of convergent conditions in distributed training with consideration of a large batch size and multiple GPUs is presented in this paper.
In computational fluid dynamics(CFD),mesh-smoothing methods are widely used to refine the mesh quality for achieving high-precision numerical ***,optimization-based smoothing is used for high-quality mesh smoothing,bu...
详细信息
In computational fluid dynamics(CFD),mesh-smoothing methods are widely used to refine the mesh quality for achieving high-precision numerical ***,optimization-based smoothing is used for high-quality mesh smoothing,but it incurs significant computational *** works have improved its smoothing efficiency by adopting supervised learning to learn smoothing methods from high-quality ***,they pose difficulties in smoothing the mesh nodes with varying degrees and require data augmentation to address the node input sequence ***,the required labeled high-quality meshes further limit the applicability of the proposed *** this paper,we present graph-based smoothing mesh net(GMSNet),a lightweight neural network model for intelligent mesh *** adopts graph neural networks(GNNs)to extract features of the node’s neighbors and outputs the optimal node *** smoothing,we also introduce a fault-tolerance mechanism to prevent GMSNet from generating negative volume *** a lightweight model,GMSNet can effectively smooth mesh nodes with varying degrees and remain unaffected by the order of input data.A novel loss function,MetricLoss,is developed to eliminate the need for high-quality meshes,which provides stable and rapid convergence during *** compare GMSNet with commonly used mesh-smoothing methods on two-dimensional(2D)triangle *** results show that GMSNet achieves outstanding mesh-smoothing performances with 5%of the model parameters compared to the previous model,but offers a speedup of 13.56 times over the optimization-based smoothing.
All-reduce is a widely used communication technique for distributed and parallel applications typically implemented using either a tree-based or ring-based scheme. Each of these approaches has its own limitations: tre...
详细信息
All-reduce is a widely used communication technique for distributed and parallel applications typically implemented using either a tree-based or ring-based scheme. Each of these approaches has its own limitations: tree-based schemes struggle with efficiently exchanging large messages, while ring-based solutions assume constant communication throughput,an unrealistic expectation in modern network communication infrastructures. We present FMCC-RT, an all-reduce approach that combines the advantages of tree-and ring-based implementations while mitigating their drawbacks. FMCC-RT dynamically switches between tree and ring-based implementations depending on the size of the message being processed. It utilizes an analytical model to assess the impact of message sizes on the achieved throughput, enabling the derivation of optimal work partitioning parameters. Furthermore, FMCC-RT is designed with an Open MPI-compatible API, requiring no modification to user code. We evaluated FMCC-RT through micro-benchmarks and real-world application tests. Experimental results show that FMCC-RT outperforms state-of-the-art tree-and ring-based methods, achieving speedups of up to 5.6×.
Innovations in powerful high-performance computing (HPC) architecture are enabling high-fidelity whole-core neutron transport simulations at reasonable time. Especially, the currently fashionable heterogeneous archite...
详细信息
Motion and appearance cues play a crucial role in Multi-object Tracking (MOT) algorithms for associating objects across consecutive frames. While most MOT methods prioritize accurate motion modeling and distincti...
详细信息
Recent advances in single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for reconstruction gene regulation networks (GRNs). At present, many different models have been proposed to inf...
详细信息
4K Video Super-Resolution (VSR) presents a challenging task in video processing, as most existing VSR models have high computational complexity, limiting their application to high-resolution videos, particularly for 4...
详细信息
Anomalies in time series appear consecutively, forming anomaly segments. Applying the classical point-based evaluation metrics to evaluate the detection performance of segments leads to considerable underestimation, s...
详细信息
Sparse triangular solve (SpTRSV) is a vital component in various scientific applications, and numerous GPU-based SpTRSV algorithms have been proposed. Synchronization-free SpTRSV is currently the mainstream algorithm ...
详细信息
Storing files at the network edge has become a new paradigm of storage systems, which is promising to mitigate network congestion and reduce file retrieval latency. However, the traditional file storage scheme cannot ...
详细信息
暂无评论