检索结果-内蒙古大学图书馆

22nd IEEE International Symposium on parallel and distributed Processing with Applications, ISPA 2024

作者： Ma, Shenghong Xu, Jinwei Jiang, Jingfei Wang, Yaohua Li, Dongsheng National University of Defense Technology National Key Laboratory of Parallel and Distributed Computing College of Computer Changsha China

ISBN: (纸本)9798331509712

The self-attention mechanism is the core component of Transformer, which provides a powerful ability to understand the sequence context. However, the self-attention mechanism also suffers from a large amount of redundant computation. Model sparsification can effectively reduce computational load, but the irregularity of non-zeros introduced by sparsification significantly decreases hardware efficiency. This paper proposes Funnel, an accelerator that dynamically predicts sparse attention patterns and efficiently processes unstructured sparse data. Firstly, we adopt a fast quantization method based on lookup table to minimize the cost of sparse patterns prediction. Secondly, we propose Funnel computing Unit (FCU), a hardware architecture that efficiently handles sparse attention through multi-dataflow fusion. Sampled Dense-Dense Matrix Multiplication (SDDMM) and Sparse-Dense Matrix Multiplication (SpMM) are core components of sparse attention mechanism. FCU unifies the computation ways of matrix inner product and row-wise product to support SDDMM and SpMM at the same time, which greatly reduces the storage and movement overhead of intermediate results. Lastly, we devise a lightweight buffer and data tiling strategy tailored to the proposed accelerator, aimed at enhancing data reuse. Experiments demonstrate that our accelerator achieves 0.10-0.25 sparsity with small accuracy loss. When computing the self-attention layer, it attains hardware efficiency ranging from 60% to 85%. Compared to CPU and GPU, it achieves 5.60x and 8.20x speedup. Compared to the state-of-the-art attention accelerators A3, SpAtten, FTRANS, and Sanger, it achieves 7.37x, 4.52x, 9.58x, and 3.08x speedup. © 2024 IEEE.

关键词： FPGA Funnel computing unit Sparse attention Transformer

来源：评论

学校读者我要写书评

暂无评论

A clustering-based approach for mining dockerfile evolutionary trajectories

引用

Science China(Information Sciences) 2019年第1期62卷 211-213页

作者： Yang ZHANG Huaimin WANG Vladimir FILKOV Key Laboratory of Parallel and Distributed Computing National University of Defense Technology College of Computer National University of Defense Technology DECAL Lab University of California Computer Science Department University of California

Dear editor,Docker1), as a de-facto industry standard [1], enables the packaging of an application with all its dependencies and execution environment in a light-weight, self-contained unit, i.e., *** launching the container from Docker image, developers can easily share the same operating system, libraries, and binaries [2]. As the configuration file, the dockerfile plays an important role,

关键词： A clustering-based approach for mining dockerfile evolutionary trajectories

来源：评论

学校读者我要写书评

暂无评论

Recovering Performance for Vector-based Machine Learning on Managed Runtime

引用

ACM SIGPLAN Notices 2017年第8期52卷 457-458页

作者： Wu, Mingyu Guan, Haibing Zang, Binyu Chen, Haibo Shanghai Key Laboratory of Scalable Computing and Systems Institute of Parallel and Distributed Systems Shanghai Jiao Tong University China

来源：评论

学校读者我要写书评

暂无评论

Efficient Large Models Fine-tuning on Commodity Servers via Memory-balanced Pipeline parallelism 25

Efficient Large Models Fine-tuning on Commodity Servers via ...

引用

25th IEEE International Conferences on High Performance computing and Communications, 9th International Conference on Data Science and Systems, 21st IEEE International Conference on Smart City and 9th IEEE International Conference on Dependability in Sensor, Cloud and Big Data Systems and Applications, HPCC/DSS/SmartCity/DependSys 2023

作者： Liu, Yujie Lai, Zhiquan Liu, Weijie Wang, Wei Li, Dongsheng College of Computer National University of Defense Technology National Key Laboratory of Parallel and Distributed Computing Changsha China

ISBN: (纸本)9798350330014

Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large models available to the general public. Previous solutions fail to achieve an efficient memory-balanced pipeline parallelism. In this poster, we introduce a memory load-balanced pipeline parallel solution. This solution balances memory consumption across stages on commodity GPU servers via NVLink bridges. It establishes a new pathway to offload data from GPU to CPU by using the PCIe link of adjacent GPUs connected by the NVLink bridge. Furthermore, our method orchestrates offload operations to minimize the offload latency during large model fine-tuning. Experiments demonstrate that our solution can balance the memory footprint among pipeline stages without sacrificing training performance. © 2023 IEEE.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

Rethinking the distributed DNN Training Cluster Design from the Cost-effectiveness View 25

Rethinking the Distributed DNN Training Cluster Design from ...

引用

作者： Lai, Zhiquan Liu, Yujie Wang, Wei Hao, Yanqi Li, Dongsheng College of Computer National University of Defense Technology National Key Laboratory of Parallel and Distributed Computing Changsha China

ISBN: (纸本)9798350330014

As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking the financial constraints faced by most researchers. To attain the best performance within the cost limitation, we introduce a throughput-cost metric to accurately characterize clusters' cost-effectiveness. Based on this metric, we design a cost-effective cluster featuring the 3090 with NVLink. The experiment results demonstrate that our cluster achieves remarkable cost-effectiveness in various distributed model training schemes. © 2023 IEEE.

关键词： Cost effectiveness

来源：评论

学校读者我要写书评

暂无评论

Area-NeRF: Area-based Neural Radiance Fields 2

Area-NeRF: Area-based Neural Radiance Fields

引用

2nd International Conference on Image Processing, Computer Vision and Machine Learning, ICICML 2023

作者： Ye, Zonxin Li, Wenyu Qiao, Peng Dou, Yong National University of Defense Technology National Key Laboratory of Parallel and Distributed Computing School of Computer Changsha China

ISBN: (纸本)9798350331417

Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the influence of the observed area changing with distance. In addition, The current sampling strategies are all focused on the distribution of sampling points on the ray, without paying attention to the sampling of the ray. We found that the current ray sampling strategy for scenes with the camera moving forward severely reduces the convergence speed. In this work, we extend the point representation to area representation by using relative positional encoding, and propose a ray sampling strategy that is suitable for camera trajectory moving forward. We validated the effectiveness of our method on multiple public datasets. © 2023 IEEE.

关键词： NeRF neural radiance field neural rendering novel view synthesis

来源：评论

学校读者我要写书评

暂无评论

Local-Adaptive Transformer for Multivariate Time Series Anomaly Detection and Diagnosis

Local-Adaptive Transformer for Multivariate Time Series Anom...

引用

2023 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2023

作者： Zhou, Xiaohui Wang, Yijie Xu, Hongzuo Liu, Mingyu Zhang, Ruyi College of Computer National University of Defense Technology National Key Laboratory of Parallel and Distributed Computing Changsha China

ISBN: (纸本)9798350337020

Time series data are pervasive in varied real-world applications, and accurately identifying anomalies in time series is of great importance. Many current methods are insufficient to model long-term dependence, whereas some anomalies can be only identified through long temporal contextual information. This may finally lead to disastrous outcomes due to false negatives of these anomalies. Prior arts employ Transformers (i.e., a neural network architecture that has powerful capability in modeling long-term dependence and global association) to alleviate this problem;however, Transformers are insensitive in sensing local context, which may neglect subtle anomalies. Therefore, in this paper, we propose a local-adaptive Transformer based on cross-correlation for time series anomaly detection, which unifies both global and local information to capture comprehensive time series patterns. Specifically, we devise a cross-correlation mechanism by employing causal convolution to adaptively capture local pattern variation, offering diverse local information into the long-term temporal learning process. Furthermore, a novel optimization objective is utilized to jointly optimize reconstruction of the entire time series and matrix derived from cross-correlation mechanism, which prevents the cross-correlation from becoming trivial in the training phase. The generated cross-correlation matrix reveals underlying interactions between dimensions of multivariate time series, which provides valuable insights into anomaly diagnosis. Extensive experiments on six real-world datasets demonstrate that our model outperforms state-of-the-art competing methods and achieves 6.8%-27.5% F1 score improvement. Our method also has good anomaly interpretability and is effective for anomaly diagnosis. © 2023 IEEE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

High Performance Interconnect Network for Tianhe System

引用

Journal of Computer Science & Technology 2015年第2期30卷 259-272页

作者：廖湘科庞征王克非卢宇彤谢旻夏军董德尊所光 College of Computer National University of Defense Technology Changsha 410073 Science and Technology on Parallel and Distributed Processing Laboratory National Changsha 410073 China China University of Defense Technology State Key Laboratory of High Performance Computing National University of Defense Technology Changsha 410073 China

In this paper, we present the Tianhe-2 interconnect network and message passing services. We describe the architecture of the router and network interface chips, and highlight a set of hardware and software features effectively supporting high performance communications, ranging over remote direct memory access, collective optimization, hardwareenable reliable end-to-end communication, user-level message passing services, etc. Measured hardware performance results are also presented.

关键词： Tianhe-2 supercomputer interconnect network router architecture network interface architecture user-level message passing

来源：评论

学校读者我要写书评

暂无评论

Mbapp: Efficient Memory-Balanced Pipeline parallelism for Large Model Fine-Tuning on Commodity GPU Servers 24

Mbapp: Efficient Memory-Balanced Pipeline Parallelism for La...

引用

5th International Conference on Computer Information and Big Data Applications, CIBDA 2024

作者： Liu, Yujie Lai, Zhiquan Li, Dongsheng National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha410000 China

ISBN: (纸本)9798400718106

Large-scale models have demonstrated outstanding performance across various downstream tasks. Pipeline parallelism is essential for fine-tuning large models on commodity GPU servers, as it plays a crucial role in making their exceptional performance more accessible and widespread. Existing approaches have encountered challenges in attaining effective memory-balanced pipeline parallelism. This paper presents a novel memory load-balanced pipeline parallel solution that aims to distribute memory usage evenly across stages on commodity GPU servers by leveraging NVLink bridges. The solution presents a novel approach for transferring data from GPUs to CPUs through the PCI-e link between adjacent GPUs interconnected by the NVLink bridge. Moreover, our methodology orchestrates data transfer operations to reduce offloading latency during the fine-tuning of large models. The evaluation demonstrates that our approach enhances switching efficiency by 1.46x to 1.76x and improves throughput by 2.04x to 3.59x compared to the PyTorch offloading technique. © 2024 ACM.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Towards estimating expected sizes of probabilistic skylines

引用

Science China(Information Sciences) 2011年第12期54卷 2574-2584页

作者： YANG YongTao & WANG YiJie national key laboratory for parallel and distributed Processing, School of Computer, national University of Defense Technology, Changsha 410073, China 1. National Key Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha 410073 China

We consider the maximal vector problem on uncertain data, which has been recently posed by the study on processing skyline queries over a probabilistic data stream in the database context. Let D n be a set of n points in a d-dimensional space and q (0 < q 1) be a probability threshold; each point in D n has a probability to occur. Our problem is concerned with how to estimate the expected size of the probabilistic skyline, which consists of all the points that are not dominated by any other point in D n with a probability not less than q. We prove that the upper bound of the expected size is O(min{n, (- ln q)(ln n) d-1 }) under the assumptions that the value distribution on each dimension is independent and the values of the points along each dimension are distinct. The main idea of our proof is to find a recurrence about the expected size and solve it. Our results reveal the relationship between the probability threshold q and the expected size of the probabilistic skyline, and show that the upper bound is poly-logarithmic when q is not extremely small.

关键词： probabilistic skyline probabilistic data streams expected size

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：