In computational fluid dynamics(CFD),mesh-smoothing methods are widely used to refine the mesh quality for achieving high-precision numerical ***,optimization-based smoothing is used for high-quality mesh smoothing,bu...
详细信息
In computational fluid dynamics(CFD),mesh-smoothing methods are widely used to refine the mesh quality for achieving high-precision numerical ***,optimization-based smoothing is used for high-quality mesh smoothing,but it incurs significant computational *** works have improved its smoothing efficiency by adopting supervised learning to learn smoothing methods from high-quality ***,they pose difficulties in smoothing the mesh nodes with varying degrees and require data augmentation to address the node input sequence ***,the required labeled high-quality meshes further limit the applicability of the proposed *** this paper,we present graph-based smoothing mesh net(GMSNet),a lightweight neural network model for intelligent mesh *** adopts graph neural networks(GNNs)to extract features of the node’s neighbors and outputs the optimal node *** smoothing,we also introduce a fault-tolerance mechanism to prevent GMSNet from generating negative volume *** a lightweight model,GMSNet can effectively smooth mesh nodes with varying degrees and remain unaffected by the order of input data.A novel loss function,MetricLoss,is developed to eliminate the need for high-quality meshes,which provides stable and rapid convergence during *** compare GMSNet with commonly used mesh-smoothing methods on two-dimensional(2D)triangle *** results show that GMSNet achieves outstanding mesh-smoothing performances with 5%of the model parameters compared to the previous model,but offers a speedup of 13.56 times over the optimization-based smoothing.
All-reduce is a widely used communication technique for distributed and parallel applications typically implemented using either a tree-based or ring-based scheme. Each of these approaches has its own limitations: tre...
详细信息
All-reduce is a widely used communication technique for distributed and parallel applications typically implemented using either a tree-based or ring-based scheme. Each of these approaches has its own limitations: tree-based schemes struggle with efficiently exchanging large messages, while ring-based solutions assume constant communication throughput,an unrealistic expectation in modern network communication infrastructures. We present FMCC-RT, an all-reduce approach that combines the advantages of tree-and ring-based implementations while mitigating their drawbacks. FMCC-RT dynamically switches between tree and ring-based implementations depending on the size of the message being processed. It utilizes an analytical model to assess the impact of message sizes on the achieved throughput, enabling the derivation of optimal work partitioning parameters. Furthermore, FMCC-RT is designed with an Open MPI-compatible API, requiring no modification to user code. We evaluated FMCC-RT through micro-benchmarks and real-world application tests. Experimental results show that FMCC-RT outperforms state-of-the-art tree-and ring-based methods, achieving speedups of up to 5.6×.
The solution of the collision-propagation equation plays an important role in the thermal-fluid multi-physics coupling simulation. Through the collision-propagation operation of different physical quantities, LBM can ...
详细信息
The solution of the collision-propagation equation plays an important role in the thermal-fluid multi-physics coupling simulation. Through the collision-propagation operation of different physical quantities, LBM can ...
详细信息
ISBN:
(数字)9798331506797
ISBN:
(纸本)9798331506803
The solution of the collision-propagation equation plays an important role in the thermal-fluid multi-physics coupling simulation. Through the collision-propagation operation of different physical quantities, LBM can integrate physical fields such as fluid and thermal into the same framework to achieve coupling solution. In this paper, the collision-propagation equation is realized through LBM, and the simulation process is further componentized to improve the efficiency and reusability of the simulation. The specific component library includes: the main control module, geometric modeling of simulation field, simulation parameter setting, simulation computing, and visualization component. In LBM, since the particle motion on each lattice is independent, the componentized approach makes it easier to assign simulation tasks to multiple processors or compute nodes, thereby increasing computational efficiency.
Association in-between features has been demonstrated to improve the representation ability of data. However, the original association data reconstruction method may face two issues: the dimension of reconstructed dat...
详细信息
Association in-between features has been demonstrated to improve the representation ability of data. However, the original association data reconstruction method may face two issues: the dimension of reconstructed data is undoubtedly higher than that of original data, and adopted association measure method does not well balance effectiveness and efficiency. To address above two issues, this paper proposes a novel association-based representation improvement method, named as AssoRep. AssoRep first obtains the association between features via distance correlation method that has some advantages than Pearson’s correlation coefficient. Then an improved matrix is formed via stacking the association value of any two features. Next, an improved feature representation is obtained by aggregating the original feature with the enhancement matrix. Finally, the improved feature representation is mapped to a low-dimensional space via principal component analysis. The effectiveness of AssoRep is validated on 120 datasets and the fruits further prefect our previous work on the association data reconstruction.
A class of geometric asynchronous parallel algorithms for solving large-scale discrete PDE eigenvalues has been studied by the author (Sun in Sci China Math 41(8): 701-725, 2011;Sun in Math Numer Sin 34(1): 1-24, 2012...
详细信息
A class of geometric asynchronous parallel algorithms for solving large-scale discrete PDE eigenvalues has been studied by the author (Sun in Sci China Math 41(8): 701-725, 2011;Sun in Math Numer Sin 34(1): 1-24, 2012;Sun in J Numer Methods Comput Appl 42(2): 104-125, 2021;Sun in Math Numer Sin 44(4): 433-465, 2022;Sun in Sci China Math 53(6): 859-894, 2023;Sun et al. in Chin Ann Math Ser B 44(5): 735-752, 2023). Different from traditional preconditioning algorithm with the discrete matrix directly, our geometric pre-processing algorithm (GPA) algorithm is based on so-called intrinsic geometric invariance, i.e., commutativity between the stiff matrix A and the grid mesh matrix G: AG=GA. Thus, the large-scale system solvers can be replaced with a much smaller block-solver as a pretreatment. In this paper, we study a sole PDE and assume G satisfies a periodic condition G(m) = I, m << dim(G). Four special cases have been studied in this paper: two-point ODE eigen-problem, Laplace eigen-problems over L-shaped region, square ring, and 3Dhexahedron. Two conclusions that "the parallelism of geometric mesh pre-transformation is mainly proportional to the number of faces of polyhedron" and "commutativity of grid mesh matrix and mass matrix is the essential condition for the GPA algorithm" have been obtained.
The naive Bayesian classifier(NBC) is a supervised machine learning algorithm having a simple model structure and good theoretical interpretability. However, the generalization performance of NBC is limited to a large...
详细信息
The naive Bayesian classifier(NBC) is a supervised machine learning algorithm having a simple model structure and good theoretical interpretability. However, the generalization performance of NBC is limited to a large extent by the assumption of attribute independence. To address this issue, this paper proposes a novel attribute grouping-based NBC(AG-NBC), which is a variant of the classical NBC trained with different attribute groups. AG-NBC first applies a novel effective objective function to automatically identify optimal dependent attribute groups(DAGs). Condition attributes in the same DAG are strongly dependent on the class attribute, whereas attributes in different DAGs are independent of one another. Then,for each DAG, a random vector functional link network with a SoftMax layer is trained to output posterior probabilities in the form of joint probability density estimation. The NBC is trained using the grouping attributes that correspond to the original condition attributes. Extensive experiments were conducted to validate the rationality, feasibility, and effectiveness of AG-NBC. Our findings showed that the attribute groups chosen for NBC can accurately represent attribute dependencies and reduce overlaps between different posterior probability densities. In addition, the comparative results with NBC, flexible NBC(FNBC), tree augmented Bayes network(TAN), gain ratio-based attribute weighted naive Bayes(GRAWNB), averaged one-dependence estimators(AODE), weighted AODE(WAODE), independent component analysis-based NBC(ICA-NBC), hidden naive Bayesian(HNB) classifier, and correlation-based feature weighting filter for naive Bayes(CFW) show that AG-NBC obtains statistically better testing accuracies, higher area under the receiver operating characteristic curves(AUCs), and fewer probability mean square errors(PMSEs) than other Bayesian classifiers. The experimental results demonstrate that AG-NBC is a valid and efficient approach for alleviating the attribute i
Large-scale deep learning models are trained distributedly due to memory and computing resource *** existing strategy generation approaches take optimal memory minimization as the *** fill in this gap,we propose a nov...
详细信息
Large-scale deep learning models are trained distributedly due to memory and computing resource *** existing strategy generation approaches take optimal memory minimization as the *** fill in this gap,we propose a novel algorithm that generates optimal parallelism strategies with the constraint of minimal memory *** propose a novel redundant memory cost model to calculate the memory overhead of each operator in a given parallel *** generate the optimal parallelism strategy,we formulate the parallelism strategy search problem into an integer linear programming problem and use an efficient solver to find minimal-memory intra-operator parallelism ***,the proposed algorithm has been extended and implemented in a multi-dimensional parallel training framework and is characterized by high throughput and minimal memory *** results demonstrate that our approach achieves memory savings of up to 67%compared to the latest Megatron-LM strategies;in contrast,the gap between the throughput of our approach and its counterparts is not large.
Partial-label learning(PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance b...
详细信息
Partial-label learning(PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance but suffer from error accumulation problems caused by mistakenly disambiguated instances. Although co-training can alleviate this issue by training two networks simultaneously and allowing them to interact with each other, most existing co-training methods train two structurally identical networks with the same task, i.e., are symmetric, rendering it insufficient for them to correct each other due to their similar limitations. Therefore, in this paper, we propose an asymmetric dual-task co-training PLL model called AsyCo,which forces its two networks, i.e., a disambiguation network and an auxiliary network, to learn from different views explicitly by optimizing distinct tasks. Specifically, the disambiguation network is trained with a self-training PLL task to learn label confidence, while the auxiliary network is trained in a supervised learning paradigm to learn from the noisy pairwise similarity labels that are constructed according to the learned label confidence. Finally, the error accumulation problem is mitigated via information distillation and confidence refinement. Extensive experiments on both uniform and instance-dependent partially labeled datasets demonstrate the effectiveness of AsyCo.
Sparse triangular solve (SpTRSV) is a vital component in various scientific applications, and numerous GPU-based SpTRSV algorithms have been proposed. Synchronization-free SpTRSV is currently the mainstream algorithm ...
详细信息
暂无评论