检索结果-内蒙古大学图书馆

Isolate Sets Based parallel Louvain Method for Community Detection

Journal of Computer science & technology 2023年第2期38卷 373-390页

作者：郄航窦勇黄震熊运生 Science and Technology on Parallel and Distributed Laboratory School of Computer National University of Defense TechnologyChangsha 410073China

Community detection is a vital task in many fields,such as social networks and financial analysis,to name a *** Louvain method,the main workhorse of community detection,is a popular heuristic *** apply it to large-scale graph networks,researchers have proposed several parallel Louvain methods(PLMs),which suffer from two challenges:the latency in the information synchronization,and the community *** tackle these two challenges,we propose an isolate sets based parallel Louvain method(IPLM)and a fusion IPLM with the hashtables based Louvain method(FIPLM),which are based on a novel graph partition *** graph partition algorithm divides the graph network into subgraphs called isolate sets,in which the vertices are relatively decoupled from *** first describe the concepts and properties of the isolate *** we propose an algorithm to divide the graph network into isolate sets,which enjoys the same computation complexity as the breadth-first ***,we propose IPLM,which can efficiently calculate and update vertices information in parallel without latency or community ***,we achieve further acceleration by FIPLM,which maintains a high quality of community detection with a faster speedup than *** two methods are for shared-memory architecture,and we implement our methods on an 8-core PC;the experiments show that IPLM achieves a maximum speedup of 4.62x and outputs higher modularity(maximum 4.76%)than the serial Louvain method on 14 of 18 ***,FIPLM achieves a maximum speedup of 7.26x.

关键词： parallel computing isolate set graph partition Louvain method community detection

来源：评论

学校读者我要写书评

暂无评论

An intelligent mesh-smoothing method with graph neural networks

引用

Frontiers of Information technology & Electronic Engineering 2025年第3期26卷 367-384页

作者： Zhichao WANG Xinhai CHEN Junjun YAN Jie LIU Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense TechnologyChangsha 410073China Laboratory of Digitizing Software for Frontier Equipment National University of Defense TechnologyChangsha 410073China

In computational fluid dynamics(CFD),mesh-smoothing methods are widely used to refine the mesh quality for achieving high-precision numerical ***,optimization-based smoothing is used for high-quality mesh smoothing,but it incurs significant computational *** works have improved its smoothing efficiency by adopting supervised learning to learn smoothing methods from high-quality ***,they pose difficulties in smoothing the mesh nodes with varying degrees and require data augmentation to address the node input sequence ***,the required labeled high-quality meshes further limit the applicability of the proposed *** this paper,we present graph-based smoothing mesh net(GMSNet),a lightweight neural network model for intelligent mesh *** adopts graph neural networks(GNNs)to extract features of the node’s neighbors and outputs the optimal node *** smoothing,we also introduce a fault-tolerance mechanism to prevent GMSNet from generating negative volume *** a lightweight model,GMSNet can effectively smooth mesh nodes with varying degrees and remain unaffected by the order of input data.A novel loss function,MetricLoss,is developed to eliminate the need for high-quality meshes,which provides stable and rapid convergence during *** compare GMSNet with commonly used mesh-smoothing methods on two-dimensional(2D)triangle *** results show that GMSNet achieves outstanding mesh-smoothing performances with 5%of the model parameters compared to the previous model,but offers a speedup of 13.56 times over the optimization-based smoothing.

关键词： Unstructured mesh Mesh smoothing Graph neural network Optimization-based smoothing

来源：评论

学校读者我要写书评

暂无评论

FMCC-RT: a scalable and fine-grained all-reduce algorithm for large-scale SMP clusters

引用

science China(Information sciences) 2025年第5期68卷 362-379页

作者： Jintao PENG Jie LIU Jianbin FANG Min XIE Yi DAI Zhiquan LAI Bo YANG Chunye GONG Xinjun MAO Guo MAO Jie REN School of Computer Science and Technology National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology National Supercomputer Center in Tianjin School of Computer Science Shaanxi Normal University

All-reduce is a widely used communication technique for distributed and parallel applications typically implemented using either a tree-based or ring-based scheme. Each of these approaches has its own limitations: tree-based schemes struggle with efficiently exchanging large messages, while ring-based solutions assume constant communication throughput,an unrealistic expectation in modern network communication infrastructures. We present FMCC-RT, an all-reduce approach that combines the advantages of tree-and ring-based implementations while mitigating their drawbacks. FMCC-RT dynamically switches between tree and ring-based implementations depending on the size of the message being processed. It utilizes an analytical model to assess the impact of message sizes on the achieved throughput, enabling the derivation of optimal work partitioning parameters. Furthermore, FMCC-RT is designed with an Open MPI-compatible API, requiring no modification to user code. We evaluated FMCC-RT through micro-benchmarks and real-world application tests. Experimental results show that FMCC-RT outperforms state-of-the-art tree-and ring-based methods, achieving speedups of up to 5.6×.

关键词： all-reduce collective communication MPI scalability

来源：评论

学校读者我要写书评

暂无评论

Optimizing Fine-Tuning in Quantized Language Models:An In-Depth Analysis of Key Variables

引用

Computers, Materials & Continua 2025年第1期82卷 307-325页

作者： Ao Shen Zhiquan Lai Dongsheng Li Xiaoyu Hu National Key Laboratory of Parallel and Distributed Computing National University of Defense TechnologyChangsha410073China Strategic Assessments and Consultation Institute Academy of Military ScienceBeijing100091China

Large-scale Language Models(LLMs)have achieved significant breakthroughs in Natural Language Processing(NLP),driven by the pre-training and fine-tuning *** this approach allows models to specialize in specific tasks with reduced training costs,the substantial memory requirements during fine-tuning present a barrier to broader ***-Efficient Fine-Tuning(PEFT)techniques,such as Low-Rank Adaptation(LoRA),and parameter quantization methods have emerged as solutions to address these challenges by optimizing memory usage and computational *** these,QLoRA,which combines PEFT and quantization,has demonstrated notable success in reducing memory footprints during fine-tuning,prompting the development of various QLoRA *** these advancements,the quantitative impact of key variables on the fine-tuning performance of quantized LLMs remains *** study presents a comprehensive analysis of these key variables,focusing on their influence across different layer types and depths within LLM *** investigation uncovers several critical findings:(1)Larger layers,such as MLP layers,can maintain performance despite reductions in adapter rank,while smaller layers,like self-attention layers,aremore sensitive to such changes;(2)The effectiveness of balancing factors depends more on specific values rather than layer type or depth;(3)In quantization-aware fine-tuning,larger layers can effectively utilize smaller adapters,whereas smaller layers struggle to do *** insights suggest that layer type is a more significant determinant of fine-tuning success than layer depth when optimizing quantized ***,for the same discount of trainable parameters,reducing the trainable parameters in a larger layer is more effective in preserving fine-tuning accuracy than in a smaller *** study provides valuable guidance for more efficient fine-tuning strategies and opens avenues for further research into optimizing LLM

关键词： Large-scale Language Model Parameter-Efficient Fine-Tuning parameter quantization key variable trainable parameters experimental analysis

来源：评论

学校读者我要写书评

暂无评论

A Heterogeneous KBA parallel Algorithm for the Cartesian Discrete Ordinates for Multizone Heterogeneous System 8

A Heterogeneous KBA Parallel Algorithm for the Cartesian Dis...

引用

8th International Conference on Computer and Communication Systems, ICCCS 2023

作者： Li, Runhua Liu, Jie National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory Changsha China

ISBN: (纸本)9781665456128

Innovations in powerful high-performance computing (HPC) architecture are enabling high-fidelity whole-core neutron transport simulations at reasonable time. Especially, the currently fashionable heterogeneous architectures make the cost of such simulations at very low level. Neutron distribution of a reactor core is governed by the Boltzmann neutron transport equation (BTE), first viable solutions of which need tremendous computer resources. Among of the high-fidelity numerical methods, the discrete ordinates method (SN) is becoming popular in the reaction design community by taking a good balance between computational cost and accuracy. Recently, MT-3000, which is a multizone heterogeneous architecture with a peak double precision performance of 11.6 TFLOPS, is proposed. In this work, the BTE is solved by the SN with heterogenous Koch-Baker-Alcouffe (KBA) parallel algorithms based on the MT-3000 architecture. A communication mechanism has been established to efficiently transmit data among the acceleration cores and the CPU cores. The kernel computation procedure is largely accelerated by the vectorization and instruction pipelining techniques. Numerical experiments show that our formulation could achieve 1.37 TFLOPs with single MT-3000, that is 11.8% of its peak performance. © 2023 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

SMVAR: A Novel RNN Accelerator Based on Non-blocking Data Distribution Structure 25

SMVAR: A Novel RNN Accelerator Based on Non-blocking Data Di...

引用

25th IEEE International Conferences on High Performance Computing and Communications, 9th International Conference on Data science and Systems, 21st IEEE International Conference on Smart City and 9th IEEE International Conference on Dependability in Sensor, Cloud and Big Data Systems and Applications, HPCC/DSS/SmartCity/DependSys 2023

作者： Xu, Jinwei Jiang, Jingfei Xu, Shiyao Gao, Lei National University of Defense Technology Science and Technology on Parallel and Distributed Laboratory Hunan Changsha China

ISBN: (纸本)9798350330014

Recurrent neural networks (RNNs) have become common models in the field of artificial intelligence to process temporal sequence task, such as speech recognition, text analysis, natural language processing, etc. To speedup RNNs inference, previous research proposed model sparse pruning techniques. However, the pruning rate of existing sparse pruning algorithms will be affected by pruning granularity and hardware friendliness. In order to approximate nonstructured pruning algorithm, this paper proposes Large Region Balanced Sparse (LRBS) pruning method, which does not limit sub-matrix shape and effectively improves pruning rate. Furthermore, we propose Sparse Matrix Vector Multiplication Accelerator for RNNs (SMVAR), which adopt non-blocking data distribution structure to solve the problem of efficient execution of large region irreg-ular matrix multiplication. To further improve the accelerator performance, SMVAR fine-grained adjusts the pipeline between macro-operations to reduce the idle of compute components. In addition, according to the coarse-grained block characteristics of LRBS algorithm, we develop the coarse-grained parallelism of accelerator with multiply compute units(CUs) structure. Experiments show that the pruning rate of our proposed LRBS is 1.25x-2.5x higher than that of the existing pruning algorithms. Compared with the existing work, the execution efficiency is improved by more than 2.02x-35.9x in the same application scenario. © 2023 IEEE.

关键词： Field programmable gate arrays (FPGA)

来源：评论

学校读者我要写书评

暂无评论

Serialization/Deserialization-free State Transfer in Serverless Workflows 24

Serialization/Deserialization-free State Transfer in Serverl...

引用

19th European Conference on Computer Systems, EuroSys 2024

作者： Lu, Fangming Wei, Xingda Huang, Zhuobin Chen, Rong Wu, Minyu Chen, Haibo Institute of Parallel and Distributed Systems Seiee Shanghai Jiao Tong University China Shanghai Artificial Intelligence Laboratory China University of Electronic Science and Technology of China China Institute of Parallel and Distributed Systems Germany

ISBN: (纸本)9798400704376

Serialization and deserialization play a dominant role in the state transfer time of serverless workflows, leading to substantial performance penalties during workflow execution. We identify the key reason as a lack of ability to efficiently access the (remote) memory of another function. We propose RMMap, an OS primitive for remote memory map. It allows a serverless function to directly access the memory of another function, even if it is located remotely. RMMap is the first to completely eliminates serialization and deserialization when transferring states between any pairs of functions in (unmodified) serverless workflows. To make remote memory map efficient and feasible, we co-design it with fast networking (RDMA), OS, language runtime, and serverless platform. Evaluations using real-world serverless workloads show that integrating RMMap with Knative reduces the serverless workflow execution time on Knative by up to 2.6 × and improves resource utilizations by 86.3%. © 2024 ACM.

关键词：

来源：评论

学校读者我要写书评

暂无评论

AFMA-Track: Adaptive Fusion of Motion and Appearance for Robust Multi-object Tracking 27th

AFMA-Track: Adaptive Fusion of Motion and Appearance for ...

引用

27th International Conference on Pattern Recognition, ICPR 2024

作者： Liao, Wei Luo, Lei Zhang, Chunyuan College of Computer Science and Technology National University of Defence Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory College of Computer Science and Technology National University of Defense Technology Changsha China

ISBN: (纸本)9783031784439

Motion and appearance cues play a crucial role in Multi-object Tracking (MOT) algorithms for associating objects across consecutive frames. While most MOT methods prioritize accurate motion modeling and distinctive appearance representations, the use of appearance and motion cues is often confined to simplistic association techniques. For instance, fixed weights are commonly employed to combine the intersection-over-union (IoU) matrix and appearance similarity matrix, yielding an association cost matrix. To harness the full potential of motion and appearance cues across diverse scenarios, we propose an innovative approach that dynamically balances motion and appearance cues based on scene and object information during the association process. Furthermore, we introduce a new mechanism for updating appearance representations, effectively mitigating noise introduced by occlusion. Our method demonstrates state-of-the-art performance on the MOT17 and MOT20 test sets. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

An unsupervised deep learning framework for gene regulatory network inference from single-cell expression data

An unsupervised deep learning framework for gene regulatory ...

引用

2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023

作者： Mao, Guo Liu, Jie National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory Changsha410073 China National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Laboratory of Software Engineering for Complex System Changsha410073 China

ISBN: (纸本)9798350337488

Recent advances in single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for reconstruction gene regulation networks (GRNs). At present, many different models have been proposed to infer GRN from a large number of RNA-seq data, but most deep learning models use a priori gene regulatory network to infer potential GRNs. It is a challenge to reconstruct GRNs from scRNA-seq data due to the noise and sparsity introduced by the dropout effect. Here, we propose GAALink, a novel unsupervised deep learning method. It first constructs the gene similarity matrix and then refines it by threshold value. It then learns feature representations of genes through a graphical attention autoencoder that propagates information across genes with different weights. Finally, we use gene feature expression for matrix completion such that the GRNs are reconstructed. Compared with seven existing GRNs reconstruction methods, GAALink achieves more accurate performance on seven scRNA-seq dataset with four ground truth networks. GAALink can provide a useful tool for inferring GRNs for scRNA-seq expression data. © 2023 IEEE.

关键词： Feature representations Gene similarity matrix Graphical attention autoencoder Matrix completion Unsupervised deep learning method

来源：评论

学校读者我要写书评

暂无评论

Guided Spatio-Temporal Learning Method for 4K Video Super-Resolution 5

Guided Spatio-Temporal Learning Method for 4K Video Super-Re...

引用

5th ACM International Conference on Multimedia in Asia, MMAsia 2023

作者： Jiang, Qin Wang, Qinglin Liu, Jie Science and Technology on Parallel and Distributed Processing Laboratory Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology China

ISBN: (纸本)9798400702051

4K Video Super-Resolution (VSR) presents a challenging task in video processing, as most existing VSR models have high computational complexity, limiting their application to high-resolution videos, particularly for 4K resolution videos. To address this issue, we propose a novel Guided Spatio-Temporal Video Super-Resolution network (GST-VSR) designed to perform 4K VSR on a single GPU. The proposed method comprises two key components: the Spatio-Temporal Alignment Network (STAN) and the Super-resolution Reconstruction Network (SRN), which work together to enhance the quality of the output frames. The STAN is responsible for extracting highly relevant features in frames and aligning the reference frame with the neighboring frames at the feature level to maintain temporal consistency. The SRN fuses high-quality features into the final high-resolution frames. Unlike existing methods, our proposed approach does not require explicit optical flow estimation, making it more efficient and less computationally demanding. To facilitate the training and testing of the compared models, we have established a new dataset, Pixabay-Set, consisting of 145 videos suitable for the 4K VSR task. Experimental results on the test dataset show that the proposed method achieves competitive performance compared to state-of-the-art models. In summary, our proposed GST-VSR network provides an effective solution to the challenging task of 4K VSR. © 2023 Copyright held by the owner/author(s).

关键词： Statistical tests

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：