检索结果-内蒙古大学图书馆

IEEE International Conference on Joint Cloud Computing (JCC)

作者： Zhilin Yang Yu Tang Linbo Qiao Xi Yang Zhen Huang National Key Laboratory of Parallel and Distributed Computing College of Computer Science National University of Defense Technology Changsha 410073 China

The scale of model parameters and the amount of training data is exponentially increasing. It requires more GPU memory with the exponential increasement of model parameters. Recomputation and swapping are two main memory optimization methods that have been extensively studied, and there are also optimization strategies that combine the two methods. However, most of them are based on heuristic search strategies, which do not explore the complete solution space and can’t guarantee the optimality of the solution results. An optimal search strategy with tensor-level recomputation and swapping is expected in large-scale model training. In this paper, we propose an optimal strategy searching algorithm combining tensor-based recomputation and swapping. Specifically, the memory swapping strategy is reformulated as an optimization problem, which converts the memory constraints into mixed integer programming, to find the optimal memory optimization strategy. By leveraging the advantages of both recomputation and swapping, this approach minimizes computation consumption without exceeding the available memory limitation. Experimental results show that our method exhibits about 60% reduction in memory requirements during the training process. Furthermore, our method can reduce the overall training time beyond the existing algorithms. Compared to Checkmate, our approach achieves about 0.3–0.9% reduction in computation cost per iteration.

关键词：

来源：评论

学校读者我要写书评

暂无评论

An effective control scheme to ensure real-time causal order in large-scale distributed interactive simulations

An effective control scheme to ensure real-time causal order...

引用

International Conference on Communication Software and Networks, ICCSN

作者： Hangjun Zhou Sha Fu Key Laboratory of Science and Technology for National Defence of Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha Hunan China Department of Information Management Hunan University of Finance and Economics Changsha Hunan China

When a large-scale distributed interactive simulation system is running on WAN, the sites usually disperse over a wide area in geography, which results in the simulation clock of each site is hardly to be accurately synchronized with that of other sites. The asynchronous clocks and large transmission latency on WAN bring on a problem for the large-scale simulations to preserve the real-time causal order delivery of received events at each site. In this article, we analyze the indirect way to compare the values of asynchronous simulation clocks at first, and then propose a novel scheme which can select the reconstructible causal control information for each message so as to ensure the causal ordering of events in real time. Experiments demonstrate that the scheme can weaken the effect of network latency, reduce the overhead of the transmission amount of control information and improve the causal order consistency in asynchronous distributed simulations.

关键词： Clocks Educational institutions

来源：评论

学校读者我要写书评

暂无评论

Towards a multi-array architecture for accelerating large-scale matrix multiplication on FPGAs

arXiv

引用

arXiv 2018年

作者： Shen, Junzhong Qiao, Yuran Huang, You Wen, Mei Zhang, Chunyuan College of Computer National University of Defense Technology Changsha410073 China National Key Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha410073 China

Large-scale floating-point matrix multiplication is a fundamental kernel in many scientific and engineering applications. Most existing work only focus on accelerating matrix multiplication on FPGA by adopting a linear systolic array. This paper towards the extension of this architecture by proposing a scalable and highly configurable multi-array architecture. In addition, we propose a work-stealing scheme to ensure the equality in the workload partition among multiple linear arrays. Furthermore, an analytical model is developed to determine the optimal design parameters. Experiments on a real-life convolutional neural network (CNN) show that we can obtain the optimal extension of the linear array architecture. Copyright © 2018, The Authors. All rights reserved.

关键词： Field programmable gate arrays (FPGA)

来源：评论

学校读者我要写书评

暂无评论

Deep Discriminative Clustering Network

Deep Discriminative Clustering Network

引用

International Joint Conference on Neural Networks

作者： Xuying Shao Keshi Ge Huayou Su Lei Luo Baoyun Peng Dongsheng Li Science and Technology on Parallel and Distributed Laboratory National University of Defense Technology Changsha China College of Computer National University of Defense Technology Changsha China

Deep clustering aims to cluster unlabeled data by embedding them into a subspace based on deep model. The key challenge of deep clustering is to learn discriminative representations for input data with high dimensions. In this paper, we present a deep discriminative clustering network for clustering the real-world images. We use a convolutional auto-encoder stacked with a softmax layer to predict clustering assignments. To learn a discriminative representations, the proposed approach adds discriminative loss as embedded regularization with relative entropy minimization. With the discriminative loss, the network can not only produce clustering assignments, but also learn discriminative features by reducing intra-cluster distance and increasing inter-cluster distance. We evaluate the proposed method on three datasets: MNIST-full, YTF and FRGC-v2.0. We outperform state-of-the-art results on MNIST-full and FRGC-v2.0 and achieve competitive result on YTF. The source code has been made publicly available at .

关键词： Machine learning Clustering algorithms Image reconstruction Linear programming Clustering methods Neural networks Task analysis

来源：评论

学校读者我要写书评

暂无评论

A peak performance model for Matrix multiplication on general-purpose DSP

引用

Hunan Daxue Xuebao/Journal of Hunan University Natural sciences 2013年第11 SUPPL.期40卷 148-152页

作者： Liu, Jie Chi, Li-Hua Xie, Lin-Chuan Wang, Yang Gan, Xin-Biao Feng, Hua Hu, Qing-Feng Science and Technology on Parallel and Distributed Processing Laboratory National Univ of Defense Technology Changsha Hunan 410073 China

DSP processor can be used to solve the high performance computation problems, which has the characteristics of high computing performance and low power. Matrix multiplication algorithm is the kernel of many scientific and technology computation, so it is of importance for theorem and practice. Based on general purpose DSP (GPDSP), a new parallel algorithm for matrix multiplication was proposed. And a peak performance model for matrix multiplication was built. From the peak performance model, an architecture of GPDSP was set up, and the parameter of GPDSP with Tflops was given, which includes the number of pipe-line, the number of SIMD registers, the breadth and latency for the hierarchical memories.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

OpenMedIA: Open-Source Medical Image Analysis Toolbox and Benchmark under Heterogeneous AI Computing Platforms

arXiv

引用

arXiv 2022年

作者： Zhuang, Jia-Xin Huang, Xiansong Yang, Yang Chen, Jiancong Yu, Yue Gao, Wei Li, Ge Chen, Jie Zhang, Tong Peng Cheng Laboratory Shenzhen China School of Computer Science and Technology Harbin Institute of Technology Shenzhen China School of Electronic and Computer Engineering Peking University China National Laboratory for Parallel and Distributed Processing National University of Defense Technology China

In this paper, we present OpenMedIA, an open-source toolbox library containing a rich set of deep learning methods for medical image analysis under heterogeneous Artificial Intelligence (AI) computing platforms. Various medical image analysis methods, including 2D/3D medical image classification, segmentation, localisation, and detection, have been included in the toolbox with PyTorch and/or MindSpore implementations under heterogeneous NVIDIA and Huawei Ascend computing systems. To our best knowledge, OpenMedIA is the first open-source algorithm library providing compared PyTorch and MindSpore implementations and results on several benchmark datasets. The source codes and models are available at https://***/ OpenMedIA. © 2022, CC BY-NC-SA.

关键词： Open systems

来源：评论

学校读者我要写书评

暂无评论

Controllable Template Generation for Document-level Event Extraction

Controllable Template Generation for Document-level Event Ex...

引用

Neural Networks, Information and Communication Engineering (NNICE), International Conference on

作者： Quntian Fang Feng Liu Zhen Huang Zhenliang Guo Changjian Wang Dongsheng Li Minghao Hu National Key Laboratory of Parallel and Distributed Processing National University of Defense Technology Changsha China Information Research Center Academy of Military Sciences Beijing China

Document-level event extraction task has achieved significant progress based on template generation methods. However, there is no reasonable regulation and restriction in the existing template-based generation methods, which results in the uncontrollability of the generation results. In some scenarios, model generates entities that do not belong to the input text, or generate template content repeatedly. It is determined by the nature of the extraction task and the generation task. To this end, we propose a controllable template generation event extraction model. According to the characteristics of template generation and event extraction tasks, the model devises copy mechanism, inhibition mechanism and rejection mechanism under the appropriately constructed template. Our model achieves state-of-the-art result on MUC-4 dataset, and finally through experimental analysis, it demonstrates the effectiveness of each mechanism we proposed.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A consistency algorithm for causal and totally ordered event delivery in DVE

A consistency algorithm for causal and totally ordered event...

引用

International Conference on Computer science and Information technology (CSIT)

作者： Hangjun Zhou Wei Zhang Yuxing Peng Sikun Li Department of Information Management Hunan College of Finance and Economics Changsha Hunan China Key Laboratory of Science and Technology for National Defence of Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha Hunan China

How to preserve causal and totally ordered event delivery is an important issue in real-time serverless DVE(distributed Virtual Environment). However, most of the related works are designed to maintain causal order merely or time stamped order with intensive computation and bandwidth overhead. In this paper, we proposed a novel distributed algorithm to maintain the before-and-after relationship between events, both causal and concurrent, of DVE at each individual node. Several simulation experiments are carried out to evaluate the performance of our algorithm and the results demonstrate that the algorithm is effective in preserving causal and totally ordered event delivery and more efficient than the previous algorithms.

关键词： Artificial neural networks Silicon Educational institutions

来源：评论

学校读者我要写书评

暂无评论

FD-MOBILENET: IMPROVED MOBILENET WITH A FAST DOWNSAMPLING STRATEGY

FD-MOBILENET: IMPROVED MOBILENET WITH A FAST DOWNSAMPLING ST...

引用

IEEE International Conference on Image processing

作者： Zheng Qin Zhaoning Zhang Xiaotao Chen Changjian Wang Yuxing Peng Science and Technology on Parallel and Distributed Laboratory National University of Defense Technology Changsha China College of Computer National University of Defense Technology Changsha China

We present Fast-Downsampling MobileNet (FD-MobileNet), an efficient and accurate network for very limited computational budgets (e.g., 10-140 MFLOPs). Our key idea is applying a fast down-sampling strategy to MobileNet framework. In FD-MobileNet, we perform 32× downsampling within 12 layers, only half the layers in the original MobileNet. This design brings three advantages: (i) It remarkably reduces the computational cost. (ii) It increases the information capacity and achieves significant performance improvements. (iii) It is engineering-friendly and provides fast actual inference speed. Experiments on ILSVRC 2012 and PASCAL VOC datasets demonstrate that FD-MobileNet consistently outperforms MobileNet and achieves comparable results with ShuffleNet under different computational budgets, for instance, surpassing Mobile-Net by 5.5% on the ILSVRC 2012 top-1 accuracy and 8.3% on the VOC 2007 mAP under a complexity of 12 MFLOPs. On an ARM-based device, FD-MobileNet achieves 1.11× inference speedup over MobileNet and 1.82× over ShuffleNet under the same complexity.

关键词： Computer architecture Complexity theory Neural networks Training Hardware Computational modeling Standards

来源：评论

学校读者我要写书评

暂无评论

A causality based dynamic message ordering for DVEs on Wide Area Network

A causality based dynamic message ordering for DVEs on Wide ...

引用

International Conference on Communication Software and Networks, ICCSN

Due to the large message transmission latency in distributed Virtual Environments(DVEs) on Wide Area Net-work(WAN), the effectiveness of causality consistency control of message ordering is determined by not only causal order of messages but also the real-timeness. If merely causal order is considered, the real-time property of DVEs may not be ensured because of the unlimited waiting time for the delayed messages. While if only real-timeness is emphasized, there may be too many delayed messages, which have to be discarded, to maintain the quality of causal message ordering. Therefore, a trade-off between the quality of causal order delivery and real-timeness is necessary for DVEs. In this article, a novel causality based message ordering approach is presented. In general, this new approach dynamically balances the demands of causal order delivery and real-timeness. Experiment results demonstrate the approach can enhance the quality of causality, while simultaneously keep the real-time property of DVEs.

关键词： Educational institutions

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：