检索结果-内蒙古大学图书馆

IEEE International Conference on High Performance computing and Communications (HPCC)

作者： Zhiquan Lai Yanqi Hao Shengwei Li Dongsheng Li National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha China

Multidimensional parallel training has been widely applied to train large-scale deep learning models like GPT-3. The efficiency of parameter communication among training devices/processes is often the performance bottleneck of large model training. Analysis of parameter communication mode and traffic has important reference significance for the research of interconnection network design and computing task scheduling to improve the training performance. In this paper, we analyze the parametric communication modes in typical 3D parallel training (data parallelism, pipeline parallelism, and tensor parallelism), and model the traffic in different communication modes. Finally, taking GPT-3 as an example, we present the communication in its 3D parallel training.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Area-NeRF: Area-based Neural Radiance Fields

Area-NeRF: Area-based Neural Radiance Fields

引用

Image Processing, Computer Vision and Machine Learning (ICICML), International Conference on

作者： Zonxin Ye Wenyu Li Peng Qiao Yong Dou National Key Laboratory of Parallel and Distributed Computing School of Computer National University of Defense Technology Changsha China

Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the influence of the observed area changing with distance. In addition, The current sampling strategies are all focused on the distribution of sampling points on the ray, without paying attention to the sampling of the ray. We found that the current ray sampling strategy for scenes with the camera moving forward severely reduces the convergence speed. In this work, we extend the point representation to area representation by using relative positional encoding, and propose a ray sampling strategy that is suitable for camera trajectory moving forward. We validated the effectiveness of our method on multiple public datasets.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Efficient Large Models Fine-tuning on Commodity Servers via Memory-balanced Pipeline parallelism

Efficient Large Models Fine-tuning on Commodity Servers via ...

引用

IEEE International Conference on High Performance computing and Communications (HPCC)

作者： Yujie Liu Zhiquan Lai Weijie Liu Wei Wang Dongsheng Li National Key Laboratory of Parallel and Distributed Computing College of Computer National University of Defense Technology Changsha China

Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large models available to the general public. Previous solutions fail to achieve an efficient memory-balanced pipeline parallelism. In this poster, we introduce a memory load-balanced pipeline parallel solution. This solution balances memory consumption across stages on commodity GPU servers via NVLink bridges. It establishes a new pathway to offload data from GPU to CPU by using the PCIe link of adjacent GPUs connected by the NVLink bridge. Furthermore, our method orchestrates offload operations to minimize the offload latency during large model fine-tuning. Experiments demonstrate that our solution can balance the memory footprint among pipeline stages without sacrificing training performance.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Cooperative Air-Ground Instant Delivery by UAVs and Crowdsourced Taxis 40

Cooperative Air-Ground Instant Delivery by UAVs and Crowdsou...

引用

40th IEEE International Conference on Data Engineering, ICDE 2024

作者： Gao, Junhui Wang, Qianru Zhang, Xin Shi, Juan Zhao, Xiang Han, Qingye Pan, Yan School of Computer Science Northwestern Polytechnical University China School of Computer Science and Technology Xidian University China Air Force Engineering University China National University of Defense Technology Laboratory for Big Data and Decision China School of Management Science and Real Estate Chongqing University China National University of Defense Technology National Key Laboratory of Information Systems Engineering China National University of Defense Technology National Key Laboratory of Parallel and Distributed Computing China

ISBN: (纸本)9798350317152

Instant delivery has become a fundamental service in people's daily lives. Different from the traditional express service, the instant delivery has a strict shipping time constraint after being ordered. However, the labor shortage makes it challenging to realize efficient instant delivery. To tackle the problem, researchers have studied to introduce vehicles (i.e., taxis) or Unmanned Aerial Vehicles (UAVs or drones) into instant delivery tasks. Unfortunately, the delivery detour of taxis and the limited battery of UAVs make it hard to meet the rapidly increasing instant delivery demands. Under this circumstance, this paper proposes an air-ground cooperative instant delivery paradigm to maximize the delivery performance and meanwhile minimize the negative effects on the taxi passengers. Specifically, a data-driven delivery potential-demands-aware cooperative strategy is designed to improve the overall delivery performance of both UAVs and taxis as well as the taxi passengers' experience. The experimental results show that the proposed method improves the delivery number by 30.1% and 114.5% compared to the taxi-based and UAV-based instant delivery respectively, and shortens the delivery time by 35.7% compared to the taxi-based instant delivery. © 2024 IEEE.

关键词： Unmanned aerial vehicles (UAV)

来源：评论

学校读者我要写书评

暂无评论

An Efficient Broadcast Authentication Protocol in Wireless Sensor Networks

引用

Chinese Journal of Electronics 2023年第2期18卷 368-372页

作者： Xin Zhao Xiaodong Wang Wanrong Yu Xingming Zhou National Key Laboratory of Parallel and Distributed Processing National University of Defense Technology Changsha China

Broadcast authentication is a critical security service in wireless sensor networks. A protocol named $\mu\text{TESLA}$ [1] has been proposed to provide efficient authentication service for such networks. However, when applied to applications such as time synchronization and fire alarm in which broadcast messages are sent infrequently, $\mu\text{TESLA}$ encounters problems of wasted key resources and slow message verification. This paper presents a new protocol named GBA (Generalized broadcast authentication), for efficient broadcast authentication in these applications. GBA utilizes the one-way key chain mechanism of $\mu\text{TESLA}$ , but modifies the keys and time intervals association, and changes the key disclosure mechanism according to the message transmission model in these applications. The proposed technique can take full use of key resources, and shorten the message verification time to an acceptable level. The analysis and experiments show that GBA is more efficient and practical than $\mu\text{TESLA}$ in applications with various message transmission models.

关键词： Wireless sensor networks Analytical models Protocols Authentication Synchronization Security Message authentication

来源：评论

学校读者我要写书评

暂无评论

DMSA: Decentralized and Multi-keyword Selective Data Sharing and Acquisition

DMSA: Decentralized and Multi-keyword Selective Data Sharing...

引用

International Symposium on parallel and distributed Processing with Applications, ISPA

作者： Moheng Lin Peichang Shi Xiang Fu Feng Jiang Guodong Yi National Key Laboratory of Parallel and Distributed Computing College of Computer Science National University of Defense Technology Changsha China Xiangjiang Lab Changsha China

ISBN: (数字)9798331509712

ISBN: (纸本)9798331509729

Blockchain technology has been extensively uti-lized in decentralized data-sharing applications, with the immutability of blockchain providing a witness for the circulation of data. However, current blockchain data-sharing solutions still fail to address the simultaneous screening needs of both the sender and receiver with multi-keywords. Without the capability to support bilateral simultaneous filtering, the disclosure of reasons for matching failures could inadvertently expose sensitive user data. Therefore, the challenge lies in enabling ciphertexts with multiple keywords and receivers with multiple interests to achieve mutual and simultaneous matching. Based on the technical foundations of SE (Searchable Encryption), MABE (Multi-Attribute Based Encryption), and polynomial fitting, this paper proposes a scheme called DMSA (Decentralized and Multi-keyword selective Sharing and selective Acquisition). This scheme can satisfy soundness, enabling ciphertexts carrying multiple keywords and receivers representing multiple interests to match each other simultaneously. We conducted a security analysis that confirms the security of DMSA against chosen-plaintext attacks. Our experimental results demonstrate a significant efficiency improvement, with a 67% increase over single-keyword data-sharing schemes and a 16% enhancement compared to the existing multi-keyword data-sharing solution.

关键词： distributed processing Filtering Data security keyword search Fitting Receivers Polynomials Data models Blockchains Encryption

来源：评论

学校读者我要写书评

暂无评论

HAF: a hybrid annotation framework based on expert knowledge and learning technique

引用

Science China(Information Sciences) 2022年第1期65卷 276-278页

作者： Zhixing LI Yue YU Tao WANG Gang YIN Xinjun MAO Huaimin WANG Key Laboratory of Parallel and Distributed Computing National University of Defense Technology College of Computer National University of Defense Technology

Dear editor,The increasing awareness of the potential value hidden in data has resulted in many data mining studies being conducted. In the domain of software engineering, for example, developers' behavioral data and code review data have been leveraged in social coding sites to automatically recommend relevant projects [1] and candidate reviewers [2, 3].

关键词：

来源：评论

学校读者我要写书评

暂无评论

DaMSTF: Domain Adversarial Learning Enhanced Meta Self-Training for Domain Adaptation

arXiv

引用

arXiv 2023年

作者： Lu, Menglong Huang, Zhen Zhao, Yunxiang Tian, Zhiliang Liu, Yang Li, Dongsheng National Key Laboratory of Parallel and Distributed Computing National University of Defense Technology China Beijing Institute of Biotechnology China

Self-training emerges as an important research line on domain adaptation. By taking the model’s prediction as the pseudo labels of the unlabeled data, self-training bootstraps the model with pseudo instances in the target domain. However, the prediction errors of pseudo labels (label noise) challenge the performance of self-training. To address this problem, previous approaches only use reliable pseudo instances, i.e., pseudo instances with high prediction confidence, to retrain the model. Although these strategies effectively reduce the label noise, they are prone to miss the hard examples. In this paper, we propose a new self-training framework for domain adaptation, namely Domain adversarial learning enhanced Self-Training Framework (DaMSTF). Firstly, DaMSTF involves meta-learning to estimate the importance of each pseudo instance, so as to simultaneously reduce the label noise and preserve hard examples. Secondly, we design a meta constructor for constructing the meta validation set, which guarantees the effectiveness of the meta-learning module by improving the quality of the meta validation set. Thirdly, we find that the meta-learning module suffers from the training guidance vanishment and tends to converge to an inferior optimal. To this end, we employ domain adversarial learning as a heuristic neural network initialization method, which can help the meta-learning module converge to a better optimal. Theoretically and experimentally, we demonstrate the effectiveness of the proposed DaMSTF. On the cross-domain sentiment classification task, DaMSTF improves the performance of BERT with an average of nearly 4%. © 2023, CC BY-NC-SA.

关键词： Forecasting

来源：评论

学校读者我要写书评

暂无评论

OLM2: Automatic Optimal Strategy Generating for Large-Scale Model Training with Limited-Memory

OLM2: Automatic Optimal Strategy Generating for Large-Scale ...

引用

IEEE International Conference on Joint Cloud computing (JCC)

作者： Zhilin Yang Yu Tang Linbo Qiao Xi Yang Zhen Huang National Key Laboratory of Parallel and Distributed Computing College of Computer Science National University of Defense Technology Changsha 410073 China

The scale of model parameters and the amount of training data is exponentially increasing. It requires more GPU memory with the exponential increasement of model parameters. Recomputation and swapping are two main memory optimization methods that have been extensively studied, and there are also optimization strategies that combine the two methods. However, most of them are based on heuristic search strategies, which do not explore the complete solution space and can’t guarantee the optimality of the solution results. An optimal search strategy with tensor-level recomputation and swapping is expected in large-scale model training. In this paper, we propose an optimal strategy searching algorithm combining tensor-based recomputation and swapping. Specifically, the memory swapping strategy is reformulated as an optimization problem, which converts the memory constraints into mixed integer programming, to find the optimal memory optimization strategy. By leveraging the advantages of both recomputation and swapping, this approach minimizes computation consumption without exceeding the available memory limitation. Experimental results show that our method exhibits about 60% reduction in memory requirements during the training process. Furthermore, our method can reduce the overall training time beyond the existing algorithms. Compared to Checkmate, our approach achieves about 0.3–0.9% reduction in computation cost per iteration.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Don't Half-listen: Capturing key-part Information in Continual Instruction Tuning

arXiv

引用

arXiv 2024年

作者： He, Yongquan Huang, Xuancheng Tang, Minghao Meng, Lingxun Li, Xiang Lin, Wei Zhang, Wenyuan Gao, Yifu Meituan China Institute of Information Engineering Chinese Academy of Sciences China National Key Laboratory of Parallel and Distributed Computing National University of Defense Technology China

Instruction tuning for large language models (LLMs) can drive them to produce results consistent with human goals in specific downstream tasks. However, the process of continual instruction tuning (CIT) for LLMs may bring about the catastrophic forgetting (CF) problem, where previously learned abilities are degraded. Recent methods try to alleviate the CF problem by modifying models or replaying data, which may only remember the surface-level pattern of instructions and get confused on held-out tasks. In this paper, we propose a novel continual instruction tuning method based on key-part Information Gain (KPIG). Our method computes the information gain on masked parts to dynamically replay data and refine the training objective, which enables LLMs to capture task-aware information relevant to the correct response and alleviate overfitting to general descriptions in instructions. In addition, we propose two metrics, P-score and V-score, to measure the generalization and instruction-following abilities of LLMs. Experiments demonstrate our method achieves superior performance on both seen and held-out tasks. Copyright © 2024, The Authors. All rights reserved.

关键词： Digital storage

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：