检索结果-内蒙古大学图书馆

IEEE International Conference on Systems, Man and Cybernetics

作者： Xiaohui Zhou Yijie Wang Hongzuo Xu Mingyu Liu Ruyi Zhang National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha China

Time series data are pervasive in varied real-world applications, and accurately identifying anomalies in time series is of great importance. Many current methods are insufficient to model long-term dependence, whereas some anomalies can be only identified through long temporal contextual information. This may finally lead to disastrous outcomes due to false negatives of these anomalies. Prior arts employ Transformers (i.e., a neural network architecture that has powerful capability in modeling long-term dependence and global association) to alleviate this problem; however, Transformers are insensitive in sensing local context, which may neglect subtle anomalies. Therefore, in this paper, we propose a local-adaptive Transformer based on cross-correlation for time series anomaly detection, which unifies both global and local information to capture comprehensive time series patterns. Specifically, we devise a cross-correlation mechanism by employing causal convolution to adaptively capture local pattern variation, offering diverse local information into the long-term temporal learning process. Furthermore, a novel optimization objective is utilized to jointly optimize reconstruction of the entire time series and matrix derived from cross-correlation mechanism, which prevents the cross-correlation from becoming trivial in the training phase. The generated cross-correlation matrix reveals underlying interactions between dimensions of multivariate time series, which provides valuable insights into anomaly diagnosis. Extensive experiments on six real-world datasets demonstrate that our model outperforms state-of-the-art competing methods and achieves 6.8%-27.5% $F_{1}$ score improvement. Our method also has good anomaly interpretability and is effective for anomaly diagnosis.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Rethinking the distributed DNN Training Cluster Design from the Cost-effectiveness View

Rethinking the Distributed DNN Training Cluster Design from ...

引用

IEEE International Conference on High Performance Computing and Communications (HPCC)

作者： Zhiquan Lai Yujie Liu Wei Wang Yanqi Hao Dongsheng Li National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha China

As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking the financial constraints faced by most researchers. To attain the best performance within the cost limitation, we introduce a throughput-cost metric to accurately characterize clusters' cost-effectiveness. Based on this metric, we design a cost-effective cluster featuring the 3090 with NVLink. The experiment results demonstrate that our cluster achieves remarkable cost-effectiveness in various distributed model training schemes.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Communication Analysis for Multidimensional parallel Training of Large-scale DNN Models

Communication Analysis for Multidimensional Parallel Trainin...

引用

IEEE International Conference on High Performance Computing and Communications (HPCC)

作者： Zhiquan Lai Yanqi Hao Shengwei Li Dongsheng Li National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha China

Multidimensional parallel training has been widely applied to train large-scale deep learning models like GPT-3. The efficiency of parameter communication among training devices/processes is often the performance bottleneck of large model training. Analysis of parameter communication mode and traffic has important reference significance for the research of interconnection network design and computing task scheduling to improve the training performance. In this paper, we analyze the parametric communication modes in typical 3D parallel training (data parallelism, pipeline parallelism, and tensor parallelism), and model the traffic in different communication modes. Finally, taking GPT-3 as an example, we present the communication in its 3D parallel training.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Efficient Large Models Fine-tuning on Commodity Servers via Memory-balanced Pipeline parallelism

Efficient Large Models Fine-tuning on Commodity Servers via ...

引用

IEEE International Conference on High Performance Computing and Communications (HPCC)

作者： Yujie Liu Zhiquan Lai Weijie Liu Wei Wang Dongsheng Li National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha China

Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large models available to the general public. Previous solutions fail to achieve an efficient memory-balanced pipeline parallelism. In this poster, we introduce a memory load-balanced pipeline parallel solution. This solution balances memory consumption across stages on commodity GPU servers via NVLink bridges. It establishes a new pathway to offload data from GPU to CPU by using the PCIe link of adjacent GPUs connected by the NVLink bridge. Furthermore, our method orchestrates offload operations to minimize the offload latency during large model fine-tuning. Experiments demonstrate that our solution can balance the memory footprint among pipeline stages without sacrificing training performance.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Graph Structure Learning via Transfer Entropy for Multivariate Time Series Anomaly Detection

Graph Structure Learning via Transfer Entropy for Multivaria...

引用

International Conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Mingyu Liu Yijie Wang Xiaohui Zhou Yongjun Wang National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Defense Technology Changsha China College of Computer Science and Technology National University of Defense Technology Changsha China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Multivariate time series anomaly detection (MTAD) poses a challenge due to temporal and feature dependencies. The critical aspects of enhancing the detection performance lie in accurately capturing the dependencies between variables within the sliding window and effectively leveraging them. Existing studies rely on domain knowledge to pre-set the window size, and overlook the strength of dependencies while calculating direction based on variable similarity. This paper proposes GSLTE, a graph structure learning method for MTAD. GSLTE employs Fast Fourier Transform to conduct iterative segmentation of the whole series, selecting the dominant Fourier frequency as the window size for each subsequence within the minimum interval. GSLTE quantifies the direction and strength of the dependencies based on variable-lag transfer entropy which is achieved through Dynamic Time Warping method to learn asymmetric links between variables. Extensive experiments show that GNN-based MTAD methods applying GSLTE can further improve anomaly detection performance while outperforming state-of-the-art competitors.

关键词： Learning systems Time-frequency analysis Fast Fourier transforms Time series analysis Signal processing Feature extraction Entropy Iterative methods Speech processing Anomaly detection

来源：评论

学校读者我要写书评

暂无评论

A Deeply Pipelined 64-bit Multiplier for High-Performance RISC-V Processors

A Deeply Pipelined 64-bit Multiplier for High-Performance RI...

引用

Frontiers Technology of Information and computer (ICFTIC), IEEE International Conference on

作者： Wenyi Liu Feng Hu Guilan Li Bangjian Xu Xin Niu College of Computer Science and Electronic Hunan University Changsha China Science and Technology on Parallel and Distributed Laboratory College of Computer National University of Defense Technology Changsha China

ISBN: (数字)9798331541750

ISBN: (纸本)9798331541767

The multiplier is an important component of the processor's computing unit. Multiplication, multiplication, addition, and multiplication and subtraction operations are widely used in various signal processing algorithms. Based on this, this article extends the multiplication instructions of RISC-V, designs a 64-bit multiplier that combines the booth2 algorithm and wallace compression technology, and designs a deep pipeline mechanism for the multiplier to improve performance. Finally, Through logical simulation and emulation, the result output is correct. Comprehensive results show that under the 28nm cmos process, the MAC unit can reach 2.22Ghz.

关键词： Program processors Pipelines Emulation Signal processing algorithms computer architecture CMOS process Encoding Circuit synthesis Testing

来源：评论

学校读者我要写书评

暂无评论

Merak: An Efficient distributed DNN Training Framework with Automated 3D parallelism for Giant Foundation Models

arXiv

引用

arXiv 2022年

作者： Lai, Zhiquan Li, Shengwei Tang, Xudong Ge, Keshi Liu, Weijie Duan, Yabo Qiao, Linbo Li, Dongsheng The National Laboratory for Parallel and Distributed Processing College of Computer National University of Defense Technology in Changsha Hunan China

Foundation models are in the process of becoming the dominant deep learning technology. Pretraining a foundation model is always time-consuming due to the large scale of both the model parameter and training dataset. Besides being computing-intensive, the pretraining process is extremely memory- and communication-intensive. These challenges make it necessary to apply 3D parallelism, which integrates data parallelism, pipeline model parallelism, and tensor model parallelism, to achieve high training efficiency. However, current 3D parallelism frameworks still encounter two issues: i) they are not transparent to model developers, requiring manual model modification to parallelize training, and ii) their utilization of computation resources, GPU memory, and network bandwidth is insufficient. We propose Merak, an automated 3D parallelism deep learning training framework with high resource utilization. Merak automatically deploys 3D parallelism with an automatic model partitioner, which includes a graph-sharding algorithm and proxy node-based model graph. Merak also offers a non-intrusive API to scale out foundation model training with minimal code modification. In addition, we design a high-performance 3D parallel runtime engine that employs several techniques to exploit available training resources, including a shifted critical path pipeline schedule that increases computation utilization, stage-aware recomputation that makes use of idle worker memory, and sub-pipelined tensor model parallelism that overlaps communication and computation. Experiments on 64 GPUs demonstrate Merak's capability to speed up training performance over state-of-the-art 3D parallelism frameworks of models with 1.5, 2.5, 8.3, and 20 billion parameters by up to 1.42, 1.39, 1.43, and 1.61×, respectively. The code for Merak has been open-sourced at https://***/hpdl-group/Merak. Copyright © 2022, The Authors. All rights reserved.

关键词： Pipelines

来源：评论

学校读者我要写书评

暂无评论

SCGraph: Accelerating Sample-based GNN Training by Staged Caching of Features on GPUs

SCGraph: Accelerating Sample-based GNN Training by Staged Ca...

引用

IEEE International Conference on Big Data and Cloud Computing (BdCloud)

作者： Yuqi He Zhiquan Lai Zhejiang Ran Lizhi Zhang Dongsheng Li National Key Laboratory of Parallel and Distributed Processing College of Computer National University of Defense Technology Changsha China

Graph neural networks (GNNs) have been becoming important tools for processing structured graph data and successfully applied to multiple graph-based application scenarios. The existing GNN systems adopt sample-based training on large-scale graphs over multiple GPUs. Although they support large-scale graph training, large data loading overhead of transferring vertex features between CPUs and GPUs is still a bottleneck. In this work, we propose SCGraph, a method that supports GPU high-speed feature caching. SCGraph classifies the graph vertices sorted by out-degrees. For high out-degree vertices, SCGraph sets grading caches via different GPUs to increase the overall cache content through NVLink high-speed data transmission between them. For low out-degree vertices, SCGraph expands training vertices' neighborhood in advance to regenerate cache. We evaluate SCGraph against two state-of-the-art industrial GNN frameworks, i.e., DGL and PaGraph on various benchmarks. Experimental results show that SCGraph improves the cache hit rate over GPUs up to 23.6%, and achieves up to 1.71x performance speedup over the state-of-the-art baselines while the convergence almost constant.

关键词： Training Memory management Loading Graphics processing units Benchmark testing Graph neural networks Data communication

来源：评论

学校读者我要写书评

暂无评论

DMSA: Decentralized and Multi-keyword Selective Data Sharing and Acquisition

DMSA: Decentralized and Multi-keyword Selective Data Sharing...

引用

International Symposium on parallel and distributed processing with Applications, ISPA

作者： Moheng Lin Peichang Shi Xiang Fu Feng Jiang Guodong Yi National Key Laboratory of Parallel and Distributed Computing College of Computer Science National University of Defense Technology Changsha China Xiangjiang Lab Changsha China

ISBN: (数字)9798331509712

ISBN: (纸本)9798331509729

Blockchain technology has been extensively uti-lized in decentralized data-sharing applications, with the immutability of blockchain providing a witness for the circulation of data. However, current blockchain data-sharing solutions still fail to address the simultaneous screening needs of both the sender and receiver with multi-keywords. Without the capability to support bilateral simultaneous filtering, the disclosure of reasons for matching failures could inadvertently expose sensitive user data. Therefore, the challenge lies in enabling ciphertexts with multiple keywords and receivers with multiple interests to achieve mutual and simultaneous matching. Based on the technical foundations of SE (Searchable Encryption), MABE (Multi-Attribute Based Encryption), and polynomial fitting, this paper proposes a scheme called DMSA (Decentralized and Multi-keyword selective Sharing and selective Acquisition). This scheme can satisfy soundness, enabling ciphertexts carrying multiple keywords and receivers representing multiple interests to match each other simultaneously. We conducted a security analysis that confirms the security of DMSA against chosen-plaintext attacks. Our experimental results demonstrate a significant efficiency improvement, with a 67% increase over single-keyword data-sharing schemes and a 16% enhancement compared to the existing multi-keyword data-sharing solution.

关键词： distributed processing Filtering Data security Keyword search Fitting Receivers Polynomials Data models Blockchains Encryption

来源：评论

学校读者我要写书评

暂无评论

HAF: a hybrid annotation framework based on expert knowledge and learning technique

引用

Science China(Information Sciences) 2022年第1期65卷 276-278页

作者： Zhixing LI Yue YU Tao WANG Gang YIN Xinjun MAO Huaimin WANG Key Laboratory of Parallel and Distributed Computing National University of Defense Technology College of Computer National University of Defense Technology

Dear editor,The increasing awareness of the potential value hidden in data has resulted in many data mining studies being conducted. In the domain of software engineering, for example, developers' behavioral data and code review data have been leveraged in social coding sites to automatically recommend relevant projects [1] and candidate reviewers [2, 3].

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：