检索结果-内蒙古大学图书馆

2025 International Conference on Electrical Automation and Artificial Intelligence, ICEAAI 2025

作者： Chen, Xue Wu, Xuesong Cao, Jianwen Ye, Baoping Institute of Software Parallel Software and Computational Science Laboratory Chinese Academy of Sciences Beijing China

ISBN: (纸本)9798331506797

The solution of the collision-propagation equation plays an important role in the thermal-fluid multi-physics coupling simulation. Through the collision-propagation operation of different physical quantities, LBM can integrate physical fields such as fluid and thermal into the same framework to achieve coupling solution. In this paper, the collision-propagation equation is realized through LBM, and the simulation process is further componentized to improve the efficiency and reusability of the simulation. The specific component library includes: the main control module, geometric modeling of simulation field, simulation parameter setting, simulation computing, and visualization component. In LBM, since the particle motion on each lattice is independent, the componentized approach makes it easier to assign simulation tasks to multiple processors or compute nodes, thereby increasing computational efficiency. © 2025 IEEE.

关键词： collision-propagation Componentization coupling LBM thermo-fluid

来源：评论

学校读者我要写书评

暂无评论

An intelligent mesh-smoothing method with graph neural networks

引用

Frontiers of Information Technology & Electronic Engineering 2025年第3期26卷 367-384页

作者： Zhichao WANG Xinhai CHEN Junjun YAN Jie LIU Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense TechnologyChangsha 410073China Laboratory of Digitizing Software for Frontier Equipment National University of Defense TechnologyChangsha 410073China

In computational fluid dynamics(CFD),mesh-smoothing methods are widely used to refine the mesh quality for achieving high-precision numerical ***,optimization-based smoothing is used for high-quality mesh smoothing,but it incurs significant computational *** works have improved its smoothing efficiency by adopting supervised learning to learn smoothing methods from high-quality ***,they pose difficulties in smoothing the mesh nodes with varying degrees and require data augmentation to address the node input sequence ***,the required labeled high-quality meshes further limit the applicability of the proposed *** this paper,we present graph-based smoothing mesh net(GMSNet),a lightweight neural network model for intelligent mesh *** adopts graph neural networks(GNNs)to extract features of the node’s neighbors and outputs the optimal node *** smoothing,we also introduce a fault-tolerance mechanism to prevent GMSNet from generating negative volume *** a lightweight model,GMSNet can effectively smooth mesh nodes with varying degrees and remain unaffected by the order of input data.A novel loss function,MetricLoss,is developed to eliminate the need for high-quality meshes,which provides stable and rapid convergence during *** compare GMSNet with commonly used mesh-smoothing methods on two-dimensional(2D)triangle *** results show that GMSNet achieves outstanding mesh-smoothing performances with 5%of the model parameters compared to the previous model,but offers a speedup of 13.56 times over the optimization-based smoothing.

关键词： Unstructured mesh Mesh smoothing Graph neural network Optimization-based smoothing

来源：评论

学校读者我要写书评

暂无评论

FMCC-RT: a scalable and fine-grained all-reduce algorithm for large-scale SMP clusters

引用

science China(Information sciences) 2025年第5期68卷 362-379页

作者： Jintao PENG Jie LIU Jianbin FANG Min XIE Yi DAI Zhiquan LAI Bo YANG Chunye GONG Xinjun MAO Guo MAO Jie REN School of Computer Science and Technology National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology National Supercomputer Center in Tianjin School of Computer Science Shaanxi Normal University

All-reduce is a widely used communication technique for distributed and parallel applications typically implemented using either a tree-based or ring-based scheme. Each of these approaches has its own limitations: tree-based schemes struggle with efficiently exchanging large messages, while ring-based solutions assume constant communication throughput,an unrealistic expectation in modern network communication infrastructures. We present FMCC-RT, an all-reduce approach that combines the advantages of tree-and ring-based implementations while mitigating their drawbacks. FMCC-RT dynamically switches between tree and ring-based implementations depending on the size of the message being processed. It utilizes an analytical model to assess the impact of message sizes on the achieved throughput, enabling the derivation of optimal work partitioning parameters. Furthermore, FMCC-RT is designed with an Open MPI-compatible API, requiring no modification to user code. We evaluated FMCC-RT through micro-benchmarks and real-world application tests. Experimental results show that FMCC-RT outperforms state-of-the-art tree-and ring-based methods, achieving speedups of up to 5.6×.

关键词： all-reduce collective communication MPI scalability

来源：评论

学校读者我要写书评

暂无评论

Componentization of Thermo-Fluid Coupling Simulation Based on LBM

Componentization of Thermo-Fluid Coupling Simulation Based o...

引用

Electrical Automation and Artificial Intelligence (ICEAAI), International Conference on

作者： Xue Chen Xuesong Wu Jianwen Cao Baoping Ye Parallel Software and Computational Science Laboratory Chinese Academy of Sciences Institute of Software Beijing China

ISBN: (数字)9798331506797

ISBN: (纸本)9798331506803

关键词： Couplings Heating systems Visualization Program processors Fluid dynamics Geometric modeling Lattices Computer architecture Libraries computational efficiency

来源：评论

学校读者我要写书评

暂无评论

A data representation method using distance correlation

引用

Frontiers of Computer science 2025年第1期19卷 1-14页

作者： Xinyan LIANG Yuhua QIAN Qian GUO Keyin ZHENG Institute of Big Data Science and Industry Shanxi UniversityTaiyuan 030006China Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education Shanxi UniversityTaiyuan 030006China School of Computer Science and Technology Taiyuan University of Science and TechnologyTaiyuan 030024China Shanxi Key Laboratory of Big Data Analysis and Parallel Computing Taiyuan University of Science and TechnologyTaiyuan 030024China

Association in-between features has been demonstrated to improve the representation ability of data. However, the original association data reconstruction method may face two issues: the dimension of reconstructed data is undoubtedly higher than that of original data, and adopted association measure method does not well balance effectiveness and efficiency. To address above two issues, this paper proposes a novel association-based representation improvement method, named as AssoRep. AssoRep first obtains the association between features via distance correlation method that has some advantages than Pearson’s correlation coefficient. Then an improved matrix is formed via stacking the association value of any two features. Next, an improved feature representation is obtained by aggregating the original feature with the enhancement matrix. Finally, the improved feature representation is mapped to a low-dimensional space via principal component analysis. The effectiveness of AssoRep is validated on 120 datasets and the fruits further prefect our previous work on the association data reconstruction.

关键词： association representation distance correlation classification

来源：评论

学校读者我要写书评

暂无评论

GPA: Intrinsic parallel Solver for the Discrete PDE Eigen-Problem

引用

COMMUNICATIONS ON APPLIED MATHEMATICS AND COMPUTATION 2025年第3期7卷 970-986页

作者： Sun, Jiachang Chinese Acad Sci Inst Software Lab Parallel Software & Computat Sci Beijing 100190 Peoples R China

A class of geometric asynchronous parallel algorithms for solving large-scale discrete PDE eigenvalues has been studied by the author (Sun in Sci China Math 41(8): 701-725, 2011;Sun in Math Numer Sin 34(1): 1-24, 2012;Sun in J Numer Methods Comput Appl 42(2): 104-125, 2021;Sun in Math Numer Sin 44(4): 433-465, 2022;Sun in Sci China Math 53(6): 859-894, 2023;Sun et al. in Chin Ann Math Ser B 44(5): 735-752, 2023). Different from traditional preconditioning algorithm with the discrete matrix directly, our geometric pre-processing algorithm (GPA) algorithm is based on so-called intrinsic geometric invariance, i.e., commutativity between the stiff matrix A and the grid mesh matrix G: AG=GA. Thus, the large-scale system solvers can be replaced with a much smaller block-solver as a pretreatment. In this paper, we study a sole PDE and assume G satisfies a periodic condition G(m) = I, m << dim(G). Four special cases have been studied in this paper: two-point ODE eigen-problem, Laplace eigen-problems over L-shaped region, square ring, and 3Dhexahedron. Two conclusions that "the parallelism of geometric mesh pre-transformation is mainly proportional to the number of faces of polyhedron" and "commutativity of grid mesh matrix and mass matrix is the essential condition for the GPA algorithm" have been obtained.

关键词： Mathematical-physical discrete eigenvalue problems Commutative operator Geometric pre-processing algorithm (GPA) Eigen-polynomial factorization

来源：评论

学校读者我要写书评

暂无评论

Attribute grouping-based naive Bayesian classifier

引用

science China(Information sciences) 2025年第3期68卷 125-149页

作者： Yulin HE Guiliang OU Philippe FOURNIER-VIGER Joshua Zhexue HUANG Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) College of Computer Science & Software Engineering Shenzhen University

The naive Bayesian classifier(NBC) is a supervised machine learning algorithm having a simple model structure and good theoretical interpretability. However, the generalization performance of NBC is limited to a large extent by the assumption of attribute independence. To address this issue, this paper proposes a novel attribute grouping-based NBC(AG-NBC), which is a variant of the classical NBC trained with different attribute groups. AG-NBC first applies a novel effective objective function to automatically identify optimal dependent attribute groups(DAGs). Condition attributes in the same DAG are strongly dependent on the class attribute, whereas attributes in different DAGs are independent of one another. Then,for each DAG, a random vector functional link network with a SoftMax layer is trained to output posterior probabilities in the form of joint probability density estimation. The NBC is trained using the grouping attributes that correspond to the original condition attributes. Extensive experiments were conducted to validate the rationality, feasibility, and effectiveness of AG-NBC. Our findings showed that the attribute groups chosen for NBC can accurately represent attribute dependencies and reduce overlaps between different posterior probability densities. In addition, the comparative results with NBC, flexible NBC(FNBC), tree augmented Bayes network(TAN), gain ratio-based attribute weighted naive Bayes(GRAWNB), averaged one-dependence estimators(AODE), weighted AODE(WAODE), independent component analysis-based NBC(ICA-NBC), hidden naive Bayesian(HNB) classifier, and correlation-based feature weighting filter for naive Bayes(CFW) show that AG-NBC obtains statistically better testing accuracies, higher area under the receiver operating characteristic curves(AUCs), and fewer probability mean square errors(PMSEs) than other Bayesian classifiers. The experimental results demonstrate that AG-NBC is a valid and efficient approach for alleviating the attribute i

关键词： naive Bayesian classifier attribute independence assumption attribute grouping dependent attribute group posterior probability class-conditional probability

来源：评论

学校读者我要写书评

暂无评论

Automatic parallelism strategy generation with minimalmemory redundancy

引用

Frontiers of Information Technology & Electronic Engineering 2025年第1期26卷 109-118页

作者： Yanqi SHI Peng LIANG Hao ZHENG Linbo QIAO Dongsheng LI National Key Laboratory of Parallel and Distributed Computing National University of Defense TechnologyChangsha 410000China

Large-scale deep learning models are trained distributedly due to memory and computing resource *** existing strategy generation approaches take optimal memory minimization as the *** fill in this gap,we propose a novel algorithm that generates optimal parallelism strategies with the constraint of minimal memory *** propose a novel redundant memory cost model to calculate the memory overhead of each operator in a given parallel *** generate the optimal parallelism strategy,we formulate the parallelism strategy search problem into an integer linear programming problem and use an efficient solver to find minimal-memory intra-operator parallelism ***,the proposed algorithm has been extended and implemented in a multi-dimensional parallel training framework and is characterized by high throughput and minimal memory *** results demonstrate that our approach achieves memory savings of up to 67%compared to the latest Megatron-LM strategies;in contrast,the gap between the throughput of our approach and its counterparts is not large.

关键词： Deep learning Automatic parallelism Minimal memory redundancy

来源：评论

学校读者我要写书评

暂无评论

AsyCo: an asymmetric dual-task co-training model for partial-label learning

引用

science China(Information sciences) 2025年第5期68卷 332-347页

作者： Beibei LI Yiyuan ZHENG Beihong JIN Tao XIANG Haobo WANG Lei FENG College of Computer Science Chongqing University State Key Laboratory of Computer Science Institute of Software Chinese Academy of Sciences University of Chinese Academy of Sciences School of Software Technology Zhejiang University School of Computer Science and Engineering Nanyang Technological University

Partial-label learning(PLL) is a typical problem of weakly supervised learning, where each training instance is annotated with a set of candidate labels. Self-training PLL models achieve state-of-the-art performance but suffer from error accumulation problems caused by mistakenly disambiguated instances. Although co-training can alleviate this issue by training two networks simultaneously and allowing them to interact with each other, most existing co-training methods train two structurally identical networks with the same task, i.e., are symmetric, rendering it insufficient for them to correct each other due to their similar limitations. Therefore, in this paper, we propose an asymmetric dual-task co-training PLL model called AsyCo,which forces its two networks, i.e., a disambiguation network and an auxiliary network, to learn from different views explicitly by optimizing distinct tasks. Specifically, the disambiguation network is trained with a self-training PLL task to learn label confidence, while the auxiliary network is trained in a supervised learning paradigm to learn from the noisy pairwise similarity labels that are constructed according to the learned label confidence. Finally, the error accumulation problem is mitigated via information distillation and confidence refinement. Extensive experiments on both uniform and instance-dependent partially labeled datasets demonstrate the effectiveness of AsyCo.

关键词： machine learning weakly supervised learning partial-label learning co-training models candidate label sets

来源：评论

学校读者我要写书评

暂无评论

DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Friendly Data Communication and parallelism Strategies 20th

DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Frie...

引用

20th IFIP WG 10.3 International Conference on Network and parallel Computing, NPC 2024

作者： Guo, Mingfeng Deng, Liang Dai, Zhe Li, Ruitian Lin, Gaofeng Liu, Jie Computational Aerodynamics Institute China Aerodynamics Research and Development Center Mianyang China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9789819628292

Sparse triangular solve (SpTRSV) is a vital component in various scientific applications, and numerous GPU-based SpTRSV algorithms have been proposed. Synchronization-free SpTRSV is currently the mainstream algorithm on GPU due to its short preprocessing time and outstanding performance. However, we observed that this algorithm still has two performance bottlenecks. Firstly, the thread-level parallel mode can introduce to thread divergence issues within GPU warps during the writing phase. Secondly, the thread-level and warp-level fusion mode may struggles to fully exploit GPU resources due to suboptimal mapping relationships between rows and threads. To address these issues, this paper proposes DaCPSpTRSV, a new synchronization-free algorithm with GPU-friendly data communication and parallelism strategies. Specifically, we first develop a fast-forward thread-level approach, incorporating an efficient global memory access pattern and a light-weight dependency control mechanism, to optimize data communication and alleviate thread divergence. A fine-grained fusion strategy is then proposed to maximize GPU parallelism by adaptively selecting the suitable thread-level or warp-level modes. Moreover, the commonly-used compressed sparse row (CSR) format is employed in our DaCPSpTRSV, enhancing the versatility of our algorithm. We evaluate our approach using 245 matrices from the SuiteSparse Matrix Collection on two NVIDIA GPUs, demonstrating speedup ratios of up to 4.77×, 4.94×, 1.67×, and 1.62× compared to cuSPARSE, Sync-Free, CapelliniSpTRSV, and YuenyeungSpTRSV, respectively. The project is open-sourced at https://***/gmfff12334/DaCP. © IFIP International Federation for Information Processing 2025.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：