检索结果-内蒙古大学图书馆

International Conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Hao Zheng Peng Liang Yu Tang Yanqi Shi Linbo Qiao Dongsheng Li National Key Laboratory of Parallel and Distributed Computing National University Of Defense Technology Changsha P.R.China

Transformer models, such as BERT, GPT, and ViT, have been applied to a wide range of areas in recent years, due to their efficacy. In order to improve the training efficiency of Transformer models, different distributed training approaches have been proposed, like Megatron-LM [8]. However, when multi-dimensional parallelism strategies are considered, due to the complexity, existing works can not harmonize the different strategies well enough to obtain a globally optimal solution. In this paper, we propose a parallelism strategy searching algorithm PTIP, which generates operator-level parallelism strategies consisting of three schemes: data parallelism, tensor parallelism, and pipeline parallelism. PTIP abstracts these three parallelism schemes simultaneously into an auxiliary graph, reformulates the searching problem into a mixed-integer programming (MIP) problem, and uses a MIP solver to obtain a high-quality multi-dimensional strategy. Experiments conducted on Transformers demonstrate that PTIP obtains 13.9% − 24.7% performance improvement compared to Megatron-LM [8].

关键词：

来源：评论

学校读者我要写书评

暂无评论

An unsupervised deep learning framework for gene regulatory network inference from single-cell expression data

An unsupervised deep learning framework for gene regulatory ...

引用

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

作者： Guo Mao Jie Liu Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Laboratory of Software Engineering for Complex System National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

Recent advances in single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for reconstruction gene regulation networks (GRNs). At present, many different models have been proposed to infer GRN from a large number of RNA-seq data, but most deep learning models use a priori gene regulatory network to infer potential GRNs. It is a challenge to reconstruct GRNs from scRNA-seq data due to the noise and sparsity introduced by the dropout effect. Here, we propose GAALink, a novel unsupervised deep learning method. It first constructs the gene similarity matrix and then refines it by threshold value. It then learns feature representations of genes through a graphical attention autoencoder that propagates information across genes with different weights. Finally, we use gene feature expression for matrix completion such that the GRNs are reconstructed. Compared with seven existing GRNs reconstruction methods, GAALink achieves more accurate performance on seven scRNA-seq dataset with four ground truth networks. GAALink can provide a useful tool for inferring GRNs for scRNA-seq expression data.

关键词：

来源：评论

学校读者我要写书评

暂无评论

CNA: A Dataset for Parsing Discourse Structure on Chinese News Articles

CNA: A Dataset for Parsing Discourse Structure on Chinese Ne...

引用

International Conference on Tools for Artificial Intelligence (ICTAI)

作者： Zhenliang Guo Zhen Huang Yong Dou Xiubin Yu Sijie Wang Zhongwu Chen Xinxin Su Xiaohang Liu National Key Laboratory of Parallel and Distributed Processing National University of Defense Technology Changsha China

Discourse structure analysis has shown to be useful for many artificial intelligence (AI) tasks such as text sum-marization and text categorization. However, for the Chinese news domain, the discourse structure analysis system is still immature due to the limitation of the lack of expert-annotated datasets. In this paper, we present CNA, a Chinese news corpus containing 1155 news articles annotated by human experts, which covers four domains and four news media sources. Next, we implement several text classification methods as baselines. Experimental results demonstrate that document-level method can achieve a better performance, and we further propose a document-level neural network model with multiple sentence features which achieves the state-of-the-art performance. In the end, we analyze the content type distribution of each sentence in CNA and the prediction errors of our model that occurred on the test set. The codes and dataset will be open-sourced at https://***/gzl98/Chinese_Discourse_Profiling.

关键词： Analytical models Codes Text categorization Neural networks Predictive models Media Artificial intelligence

来源：评论

学校读者我要写书评

暂无评论

Offline Imitation Learning Using Reward-free Exploratory Data 5

Offline Imitation Learning Using Reward-free Exploratory Dat...

引用

5th International Conference on Algorithms, Computing and Artificial Intelligence, ACAI 2022

作者： Wang, Hao Feng, Dawei Ding, Bo Li, Wei College of Computer National University of Defense Technology Changsha Hunan China National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China Independent Researcher GuangZhou China

ISBN: (纸本)9781450398343

Offline imitative learning(OIL) is often used to solve complex continuous decision-making tasks. For these tasks such as robot control, automatic driving and etc., it is either difficult to design an effective reward for learning or very expensive and time-consuming for agents to collect data interactively with the environment. However, the data used in previous OIL methods are all gathered by reinforcement learning algorithms guided by task-specific rewards, which is not a true reward-free premise and still suffers from the problem of designing an effective reward function in real tasks. To this end, we propose the reward-free exploratory data driven offline imitation learning (ExDOIL) framework. ExDOIL first trains an unsupervised reinforcement learning agent by interacting with the environment, and collects enough unsupervised exploration data during training;Then, a task independent yet simple and efficient reward function is used to relabel the collected data;Finally, an agent is trained to imitate the expert to complete the task through a conventional RL algorithm such as TD3. Extensive experiments on continuous control tasks demonstrate that the proposed framework can achieve better imitation performance(28% higher episode returns on average) comparing with previous SOTA method(ORIL) without any task-specific rewards. © 2022 ACM.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Pro-Prophet: A Systematic Load Balancing Method for Efficient parallel Training of Large-scale MoE Models

arXiv

引用

arXiv 2024年

作者： Wang, Wei Lai, Zhiquan Li, Shengwei Liu, Weijie Ge, Keshi Shen, Ao Su, Huayou Li, Dongsheng The National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Hunan Changsha China

he size of deep learning models has been increasing to enhance model quality. The linear increase in training computation budget with model size means that training an extremely large-scale model is exceedingly time-consuming. Recently, the Mixture of Expert (MoE) has drawn significant attention as it can scale models to extra-large sizes with a stable computation budget. However, inefficient distributed training of large-scale MoE models hinders their broader application. Specifically, a considerable dynamic load imbalance occurs among devices during training, significantly reducing throughput. Several load-balancing works have been proposed to address the challenge. System-level solutions draw more attention for their hardware affinity and non-disruption of model convergence compared to algorithm-level ones. However, they are troubled by high communication costs and poor communication-computation overlapping. To address these challenges, we propose a systematic load-balancing method, Pro-Prophet, which consists of a planner and a scheduler for efficient parallel training of large-scale MoE models. To adapt to the dynamic load imbalance, we profile training statistics and use them to design Pro-Prophet. For lower communication volume, Pro-Prophet planner determines a series of lightweight load-balancing strategies and efficiently searches for a communication-efficient one for training based on the statistics. For sufficient overlapping of communication and computation, Pro-Prophet scheduler schedules the data-dependent operations based on the statistics and operation features, further improving the training throughput. We conduct extensive experiments in four clusters and five MoE models. The results indicate that Pro-Prophet achieves up to 2.66x speedup compared to two popular MoE frameworks including Deepspeed-MoE and FasterMoE. Furthermore, Pro-Prophet has demonstrated a load-balancing improvement of up to 11.01x compared to a representative load-balancing work,

关键词： Budget control

来源：评论

学校读者我要写书评

暂无评论

Representation learning on textual network with personalized Page Rank

引用

Science China(Information Sciences) 2021年第11期64卷 95-104页

作者： Teng LI Yong DOU National Laboratory for Parallel and Distributed Processing National University of Defense Technology

Representation learning on textual network or textual network embedding, which leverages rich textual information associated with the network structure to learn low-dimensional embedding of vertices, has been useful in a variety of tasks. However, most approaches learn textual network embedding by using direct neighbors. In this paper, we employ a powerful and spatially localized operation: personalized Page Rank(PPR) to eliminate the restriction of using only the direct connection relationship. Also, we analyze the relationship between PPR and spectral-domain theory, which provides insight into the empirical performance boost. From the experiment, we discovered that the proposed method provides a great improvement in linkprediction tasks, when compared to existing methods, achieving a new state-of-the-art on several real-world benchmark datasets.

关键词： representation learning network embedding PageRank textual network personalized PageRank

来源：评论

学校读者我要写书评

暂无评论

parallelization of Fast Monte Carlo Dose Calculation for Radiotherapy Treatment Planning on the ARMv8 Architecture 11

Parallelization of Fast Monte Carlo Dose Calculation for Rad...

引用

11th International Conference on Information Science and Technology, ICIST 2021

作者： Jin, Chi Wang, Qinglin Zhao, Yang Dou, Yong National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory Changsha China

ISBN: (纸本)9781665412667

Monte Carlo (MC) simulation plays a key role in radiotherapy. Since the simulation time of the MC program cannot fully meet the clinical requirements, we use the ARM-based FT-2000+ multi-core processor for parallelization, which provides an effective solution for accelerating MC dose calculation. In this paper, we implement and verify FT-DPM, which is an OpenMP-based MC Dose Planning Method on FT-2000+. FT-DPM utilizes the parallelism of MC simulation and the advantage of ARM architecture to achieve the parallelization on ARM architecture. Meanwhile, we optimize the original DPM program in terms of memory allocation, data structure and data type. The experiments show that, compared with the original DPM code, FT-DPM obtains very accurate results and reaches the maximum speedups of 155.94 times for the electron case. The parallel program based on FT-2000+ only takes 44.1 seconds to simulate the particle transport of 100 million times, showing good clinical application potential. In addition, the speedup and efficiency of FT-DPM running on different core counts are also discussed. © 2021 IEEE.

关键词： Application programs

来源：评论

学校读者我要写书评

暂无评论

Evaluating matrix multiplication-based convolution algorithm on multi-core digital signal processors

引用

Guofang Keji Daxue Xuebao/Journal of national University of Defense Technology 2023年第1期45卷 86-94页

作者： Wang, Qinglin Pei, Xiangdong Liao, Linyu Wang, Haoxu Li, Rongchun Mei, Songzhu Li, Dongsheng College of Computer Science and Technology National University of Defense Technology Changsha410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China

The matrix multiplication-based convolutional algorithm, which can efficiently implement convolutions with different parameters, is the first choice of convolution performance optimization for a given chip. Based on the architecture of Phytium heterogeneous multi-core DSPs(digital signal processors) developed by national University of Defense Technology and the characteristic of the matrix multiplication-based convolutional algorithm, a parallel implementation of the matrix multiplication-based convolutional algorithm (called ftmEConv) for different convolutions on multi-core DSPs was proposed. The ftmEConv consists of four parallelized parts(input feature maps transformation, filter transformation, matrix multiplication, and output feature maps transformation), all of which were optimized for multi-core DSPs, and the performance of each part was improved by effectively exploiting the potential of all functional units in DSP cores. The experimental results demonstrate that ftmEConv achieves computational efficiency of up to 42.90%. Compared with other implementations of the matrix multiplication-based convolutional algorithm on heterogeneous chips, ftmEConv gets a speedup of up to 7.79 times. © 2023 national University of Defense Technology. All rights reserved.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

Temporal Closing Path for PLM-based Temporal Knowledge Graph Completion

Temporal Closing Path for PLM-based Temporal Knowledge Graph...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Xin Zhou Yongxue Shan Zixuan Dong Haijiao Liu Xiaodong Wang National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Denfense Technology Changsha China

ISBN: (数字)9798350359312

ISBN: (纸本)9798350359329

Temporal Knowledge Graph Completion (TKGC) aims to predict missing parts of quadruples, which is crucial for real-life knowledge graphs. Compared with methods that only use graph neural networks, the emergence of pre-trained model has introduced a trend of simultaneously leveraging text and graph structure information. However, most current methods based on pre-trained models struggle to effectively utilize both text and multi-hop graph structure information concurrently, resulting in insufficient association mining of relations. To address the challenge, we propose a novel model: Temporal Closing Path for Pre-trained Language Model-based TKGC (TCP-PLM). We obtain the temporal closing relation path of the target relation through sampling, and use the relation path as a bridge to simultaneously utilize text and multi-hop graph structure information. Moreover, the relation path serves as a tool for mining associations between relations. At the same time, due to the design of entity-independent relation paths, our model can also handle the inductive setting. Our experiments on three benchmarks, along with extensive analysis, demonstrate that our model not only achieves substantial performance enhancements across four metrics compared to other models but also adeptly handles inductive settings.

关键词： Training Measurement Knowledge engineering Bridges Analytical models Knowledge graphs Transforms

来源：评论

学校读者我要写书评

暂无评论

Large-scale graph processing systems: a survey

引用

信息与电子工程前沿（英文版） 2020年第3期21卷 384-404页

作者： Ning LIU Dong-sheng LI Yi-ming ZHANG Xiong-lve LI Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense TechnologyChangsha 410000China

Graph is a significant data structure that describes the relationship between entries. Many application domains in the real world are heavily dependent on graph data. However, graph applications are vastly different from traditional applications. It is inefficient to use general-purpose platforms for graph applications, thus contributing to the research of specific graph processing platforms. In this survey, we systematically categorize the graph workloads and applications, and provide a detailed review of existing graph processing platforms by dividing them into general-purpose and specialized systems. We thoroughly analyze the implementation technologies including programming models, partitioning strategies, communication models, execution models, and fault tolerance strategies. Finally, we analyze recent advances and present four open problems for future research.

关键词： Graph workloads Graph applications Graph processing systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：