检索结果-内蒙古大学图书馆

An intelligent mesh-smoothing method with graph neural networks

Frontiers of Information technology & Electronic Engineering 2025年第3期26卷 367-384页

作者： Zhichao WANG Xinhai CHEN Junjun YAN Jie LIU Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense TechnologyChangsha 410073China Laboratory of Digitizing Software for Frontier Equipment National University of Defense TechnologyChangsha 410073China

In computational fluid dynamics(CFD),mesh-smoothing methods are widely used to refine the mesh quality for achieving high-precision numerical ***,optimization-based smoothing is used for high-quality mesh smoothing,but it incurs significant computational *** works have improved its smoothing efficiency by adopting supervised learning to learn smoothing methods from high-quality ***,they pose difficulties in smoothing the mesh nodes with varying degrees and require data augmentation to address the node input sequence ***,the required labeled high-quality meshes further limit the applicability of the proposed *** this paper,we present graph-based smoothing mesh net(GMSNet),a lightweight neural network model for intelligent mesh *** adopts graph neural networks(GNNs)to extract features of the node’s neighbors and outputs the optimal node *** smoothing,we also introduce a fault-tolerance mechanism to prevent GMSNet from generating negative volume *** a lightweight model,GMSNet can effectively smooth mesh nodes with varying degrees and remain unaffected by the order of input data.A novel loss function,MetricLoss,is developed to eliminate the need for high-quality meshes,which provides stable and rapid convergence during *** compare GMSNet with commonly used mesh-smoothing methods on two-dimensional(2D)triangle *** results show that GMSNet achieves outstanding mesh-smoothing performances with 5%of the model parameters compared to the previous model,but offers a speedup of 13.56 times over the optimization-based smoothing.

关键词： Unstructured mesh Mesh smoothing Graph neural network Optimization-based smoothing

来源：评论

学校读者我要写书评

暂无评论

FMCC-RT: a scalable and fine-grained all-reduce algorithm for large-scale SMP clusters

引用

science China(Information sciences) 2025年第5期68卷 362-379页

作者： Jintao PENG Jie LIU Jianbin FANG Min XIE Yi DAI Zhiquan LAI Bo YANG Chunye GONG Xinjun MAO Guo MAO Jie REN School of Computer Science and Technology National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology National Supercomputer Center in Tianjin School of Computer Science Shaanxi Normal University

All-reduce is a widely used communication technique for distributed and parallel applications typically implemented using either a tree-based or ring-based scheme. Each of these approaches has its own limitations: tree-based schemes struggle with efficiently exchanging large messages, while ring-based solutions assume constant communication throughput,an unrealistic expectation in modern network communication infrastructures. We present FMCC-RT, an all-reduce approach that combines the advantages of tree-and ring-based implementations while mitigating their drawbacks. FMCC-RT dynamically switches between tree and ring-based implementations depending on the size of the message being processed. It utilizes an analytical model to assess the impact of message sizes on the achieved throughput, enabling the derivation of optimal work partitioning parameters. Furthermore, FMCC-RT is designed with an Open MPI-compatible API, requiring no modification to user code. We evaluated FMCC-RT through micro-benchmarks and real-world application tests. Experimental results show that FMCC-RT outperforms state-of-the-art tree-and ring-based methods, achieving speedups of up to 5.6×.

关键词： all-reduce collective communication MPI scalability

来源：评论

学校读者我要写书评

暂无评论

AFMA-Track: Adaptive Fusion of Motion and Appearance for Robust Multi-object Tracking 27th

AFMA-Track: Adaptive Fusion of Motion and Appearance for ...

引用

27th International Conference on Pattern Recognition, ICPR 2024

作者： Liao, Wei Luo, Lei Zhang, Chunyuan College of Computer Science and Technology National University of Defence Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory College of Computer Science and Technology National University of Defense Technology Changsha China

ISBN: (纸本)9783031784439

Motion and appearance cues play a crucial role in Multi-object Tracking (MOT) algorithms for associating objects across consecutive frames. While most MOT methods prioritize accurate motion modeling and distinctive appearance representations, the use of appearance and motion cues is often confined to simplistic association techniques. For instance, fixed weights are commonly employed to combine the intersection-over-union (IoU) matrix and appearance similarity matrix, yielding an association cost matrix. To harness the full potential of motion and appearance cues across diverse scenarios, we propose an innovative approach that dynamically balances motion and appearance cues based on scene and object information during the association process. Furthermore, we introduce a new mechanism for updating appearance representations, effectively mitigating noise introduced by occlusion. Our method demonstrates state-of-the-art performance on the MOT17 and MOT20 test sets. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

Optimizing Fine-Tuning in Quantized Language Models:An In-Depth Analysis of Key Variables

引用

Computers, Materials & Continua 2025年第1期82卷 307-325页

作者： Ao Shen Zhiquan Lai Dongsheng Li Xiaoyu Hu National Key Laboratory of Parallel and Distributed Computing National University of Defense TechnologyChangsha410073China Strategic Assessments and Consultation Institute Academy of Military ScienceBeijing100091China

Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language processing(NLP),driven by the pre-training and fine-tuning *** this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader ***-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational *** these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA *** these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains *** study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM *** investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do *** insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized ***,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller *** study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM

关键词： Large-scale Language Model Parameter-Efficient Fine-Tuning parameter quantization key variable trainable parameters experimental analysis

来源：评论

学校读者我要写书评

暂无评论

DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Friendly Data Communication and parallelism Strategies 20th

DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Frie...

引用

20th IFIP WG 10.3 International Conference on Network and parallel Computing, NPC 2024

作者： Guo, Mingfeng Deng, Liang Dai, Zhe Li, Ruitian Lin, Gaofeng Liu, Jie Computational Aerodynamics Institute China Aerodynamics Research and Development Center Mianyang China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9789819628292

Sparse triangular solve (SpTRSV) is a vital component in various scientific applications, and numerous GPU-based SpTRSV algorithms have been proposed. Synchronization-free SpTRSV is currently the mainstream algorithm on GPU due to its short preprocessing time and outstanding performance. However, we observed that this algorithm still has two performance bottlenecks. Firstly, the thread-level parallel mode can introduce to thread divergence issues within GPU warps during the writing phase. Secondly, the thread-level and warp-level fusion mode may struggles to fully exploit GPU resources due to suboptimal mapping relationships between rows and threads. To address these issues, this paper proposes DaCPSpTRSV, a new synchronization-free algorithm with GPU-friendly data communication and parallelism strategies. Specifically, we first develop a fast-forward thread-level approach, incorporating an efficient global memory access pattern and a light-weight dependency control mechanism, to optimize data communication and alleviate thread divergence. A fine-grained fusion strategy is then proposed to maximize GPU parallelism by adaptively selecting the suitable thread-level or warp-level modes. Moreover, the commonly-used compressed sparse row (CSR) format is employed in our DaCPSpTRSV, enhancing the versatility of our algorithm. We evaluate our approach using 245 matrices from the SuiteSparse Matrix Collection on two NVIDIA GPUs, demonstrating speedup ratios of up to 4.77×, 4.94×, 1.67×, and 1.62× compared to cuSPARSE, Sync-Free, CapelliniSpTRSV, and YuenyeungSpTRSV, respectively. The project is open-sourced at https://***/gmfff12334/DaCP. © IFIP International Federation for Information processing 2025.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

YFLM: An Improved Levenberg-Marquardt Algorithm for Global Bundle Adjustment 41st

YFLM: An Improved Levenberg-Marquardt Algorithm for Global ...

引用

41st Computer Graphics International Conference, CGI 2024

作者： Peng, Jiaxin Li, Tao Jiang, Qin Liu, Jie Wang, Ruibo Laboratory of Software Engineering for Complex Systems School of Computer Science National University of Defense Technology Hunan Changsha410073 China Parallel and Distributed Processing Laboratory School of Computer Science National University of Defense Technology Hunan Changsha410073 China

ISBN: (纸本)9783031820205

The conventional Levenberg-Marquardt (LM) algorithm is a state-of-the-art trust-region optimization method for solving bundle adjustment problems in the Structure-from-Motion community, which not only takes advantage of the fast convergence of the Gauss-Newton method, but also the stability of the gradient descent method when approaching optimal solutions. However, the damping ratio of LM is simply provided by trial-and-error, which causes slow convergence rate for large-scale problems. This paper proposes the Yamashita-Fukushima LM (YFLM) algorithm to reduce the time complexity for global bundle adjustment, where the damping factor is determined by Yamashita and Fukushima’s method. YFLM dynamically calculates a more reasonable and optimal damping ratio according to the newest reprojection error. The experimental results show that the YFLM algorithm outperforms the conventional LM algorithm for most public bundle adjustment datasets. Besides this, the convergence of the YFLM algorithm is also evaluated with different σ∈(0,2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Damping

来源：评论

学校读者我要写书评

暂无评论

A data representation method using distance correlation

引用

Frontiers of Computer science 2025年第1期19卷 1-14页

作者： Xinyan LIANG Yuhua QIAN Qian GUO Keyin ZHENG Institute of Big Data Science and Industry Shanxi UniversityTaiyuan 030006China Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education Shanxi UniversityTaiyuan 030006China School of Computer Science and Technology Taiyuan University of Science and TechnologyTaiyuan 030024China Shanxi Key Laboratory of Big Data Analysis and Parallel Computing Taiyuan University of Science and TechnologyTaiyuan 030024China

Association in-between features has been demonstrated to improve the representation ability of data. However, the original association data reconstruction method may face two issues: the dimension of reconstructed data is undoubtedly higher than that of original data, and adopted association measure method does not well balance effectiveness and efficiency. To address above two issues, this paper proposes a novel association-based representation improvement method, named as AssoRep. AssoRep first obtains the association between features via distance correlation method that has some advantages than Pearson’s correlation coefficient. Then an improved matrix is formed via stacking the association value of any two features. Next, an improved feature representation is obtained by aggregating the original feature with the enhancement matrix. Finally, the improved feature representation is mapped to a low-dimensional space via principal component analysis. The effectiveness of AssoRep is validated on 120 datasets and the fruits further prefect our previous work on the association data reconstruction.

关键词： association representation distance correlation classification

来源：评论

学校读者我要写书评

暂无评论

Deep Time Series Anomaly Detection with Local Temporal Pattern Learning

Deep Time Series Anomaly Detection with Local Temporal Patte...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal processing, ICASSP 2025

作者： Li, Yizhou Wang, Yijie Xu, Hongzuo Zhou, Xiaohui National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Defense Technology Changsha410073 China Beijing100091 China

ISBN: (纸本)9798350368741

Self-supervised time series anomaly detection (TSAD) demonstrates remarkable performance improvement by extracting high-level data semantics through proxy tasks. Nonetheless, most existing self-supervised TSAD techniques rely on manual- or neural-based transformations when designing proxy tasks, overlooking the intrinsic temporal patterns of time series. This paper proposes a local temporal pattern learning-based time series anomaly detection (LTPAD). LTPAD first generates sub-sequences. Pairwise sub-sequences naturally manifest proximity relationships along the time axis, and such correlations can be used to construct supervision and train neural networks to facilitate the learning of temporal patterns. Time intervals between two sub-sequences serve as labels for sub-sequence pairs. By classifying these labeled data pairs, our model captures the local temporal patterns of time series, thereby modeling the temporal pattern-aware "normality". Abnormal scores of testing data are acquired by evaluating their conformity to these learned patterns shared in training data. Extensive experiments show that LTPAD significantly outperforms state-of-the-art competitors. © 2025 IEEE.

关键词： Local Temporal Pattern Self-supervised Learning Time Series Anomaly Detection

来源：评论

学校读者我要写书评

暂无评论

Graph Structure Learning via Transfer Entropy for Multivariate Time Series Anomaly Detection

Graph Structure Learning via Transfer Entropy for Multivaria...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal processing, ICASSP 2025

作者： Liu, Mingyu Wang, Yijie Zhou, Xiaohui Wang, Yongjun National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Defense Technology Changsha China College of Computer Science and Technology National University of Defense Technology Changsha China

ISBN: (纸本)9798350368741

Multivariate time series anomaly detection (MTAD) poses a challenge due to temporal and feature dependencies. The critical aspects of enhancing the detection performance lie in accurately capturing the dependencies between variables within the sliding window and effectively leveraging them. Existing studies rely on domain knowledge to pre-set the window size, and overlook the strength of dependencies while calculating direction based on variable similarity. This paper proposes GSLTE, a graph structure learning method for MTAD. GSLTE employs Fast Fourier Transform to conduct iterative segmentation of the whole series, selecting the dominant Fourier frequency as the window size for each subsequence within the minimum interval. GSLTE quantifies the direction and strength of the dependencies based on variable-lag transfer entropy which is achieved through Dynamic Time Warping method to learn asymmetric links between variables. Extensive experiments show that GNN-based MTAD methods applying GSLTE can further improve anomaly detection performance while outperforming state-of-the-art competitors. © 2025 IEEE.

关键词： Anomaly detection Graph structure learning Multivariate time series Window size selection

来源：评论

学校读者我要写书评

暂无评论

Comprehensive Deadlock Prevention for GPU Collective Communication 25

Comprehensive Deadlock Prevention for GPU Collective Communi...

引用

20th European Conference on Computer Systems, EuroSys 2025, co-located 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2025

作者： Pan, Lichen Liu, Juncheng Fu, Yongquan Yuan, Jinhui Zhang, Rongkai Li, Pengze Xiao, Zhen School of Computer Science Peking University China OneFlow Research China National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Defense Technology China

ISBN: (纸本)9798400711961

distributed deep neural network training necessitates efficient GPU collective communications, which are inherently susceptible to deadlocks. GPU collective deadlocks arise easily in distributed deep learning applications when multiple collectives circularly wait for each other. GPU collective deadlocks pose a significant challenge to the correct functioning and efficiency of distributed deep learning, and no general effective solutions are currently available. Only in specific scenarios, ad-hoc methods, making an application invoke collectives in a consistent order across GPUs, can be used to prevent circular collective dependency and deadlocks. This paper presents DFCCL, a novel GPU collective communication library that provides a comprehensive approach for GPU collective deadlock prevention while maintaining high performance. DFCCL achieves preemption for GPU collectives at the bottom library level, effectively preventing deadlocks even if applications cause circular collective dependency. DFCCL ensures high performance with its execution and scheduling methods for collectives. Experiments show that DFCCL effectively prevents GPU collective deadlocks in various situations. Moreover, extensive evaluations demonstrate that DFCCL delivers performance comparable to or superior to NCCL, the state-of-the-art collective communication library highly optimized for NVIDIA GPUs. © 2025 Copyright held by the owner/author(s).

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：