检索结果-内蒙古大学图书馆

2017 2nd International Conference on Software, Multimedia and Communication Engineering（SMCE 2017)

作者： Fang MA Shao-he LV Ke-xin ZHENG Chi JIN Fei CHEN Ke YANG and Yong DOU National Laboratory for Parallel and Distributed Processing National University of Defense Technology University of South China School of Computer Science and Technology

Image annotation generates a set of semantic labels that describe the contents of an input *** deep learning techniques have achieved significant success in many areas of image *** this paper,we present a multi-label image annotation method that combines unsupervised object hypotheses generation and deep neural *** an image,object hypotheses are generated in an unsupervised *** we extract the image features for each hypothesis with a deep neural network *** combining the features of all hypotheses,we get the features of the entire ***,we calculate for each label the probability of that the label is correlated with the given *** can be trained in an end-to-end way using the standard backward propagation *** results on multiple benchmark datasets show that our method is better than the state-of-the-art ones.

关键词： Deep learning Multi-label annotation Object hypotheses

来源：评论

学校读者我要写书评

暂无评论

Detailed and clock-driven simulation for HPC interconnection network

引用

Frontiers of Computer science 2016年第5期10卷 797-811页

作者： Wenhao ZHOU Juan CHEN Chen CUI Qian WANG Dezun DONG Yuhua TANG State Key Laboratory of High Performance Computing School of Computer National University of Defense Technology Changsha 410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China

Performance and energy consumption of high performance computing （HPC） interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form is very important for the research on HPC software and hardware technologies. To effectively evaluate the per- formance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation plat- form, called HPC-NetSim. HPC-NetSim uses application- driven workloads and inherits the characteristics of the de- tailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router＇s on/off states. We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.

关键词： high performance computing clock-driven sim-ulation interconnection network BookSim

来源：评论

学校读者我要写书评

暂无评论

Optimizing guest swapping using elastic and transparent memory provisioning on virtualization platform

引用

Frontiers of Computer science 2016年第5期10卷 908-924页

作者： Xi LI Pengfei ZHANG Rui CHU Huaimin WANG School of Information Science and Engineering Central South University Changsha 410083 China National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha 410008 China

On virtualization platforms, peak memory de- mand caused by hotspot applications often triggers page swapping in guest OS, causing performance degradation in- side and outside of this virtual machine （VM）. Even though host holds sufficient memory pages, guest OS is unable to utilize free pages in host directly due to the semantic gap between virtual machine monitor （MM） and guest operat- ing system （OS）. Our work aims at utilizing the free memory scattered in multiple hosts in a virtualization environment to improve the performance of guest swapping in a transparent and implicit way. Based on the insightful analysis of behav- ioral characteristics of guest swapping, we design and im- plement a distributed and scalable framework HybridSwap. It dynamically constructs virtual swap pools using various policies, and builds up a synthetic swapping mechanism in a peer-to-peer way, which can adaptively choose different vir- tual swap pools. We implement the prototype of HybridSwap and evaluate it with some benchmarks in different scenar- ios. The evaluation results demonstrate that our solution has the ability to promote the guest swapping efficiency indeed and shows a double performance promotion in some cases. Even in the worst case, the system overhead brought by Hy- bridSwap is acceptable.

关键词： virtualization memory management guestswapping performance degradation

来源：评论

学校读者我要写书评

暂无评论

A High-efficiency FPGA-based Accelerator for Convolutional Neural Networks using Winograd Algorithm

引用

Journal of Physics: Conference Series 2018年第1期1026卷

作者： Y Huang J Shen Z Wang M Wen C Zhang Science and Technology on Parallel and Distributed Laboratory National University of Defence Technology Changsha 410073 China College of Computer National University of Defence Technology Changsha 410073 China

Convolutional neural networks (CNNs) are widely used in many computer vision applications. Previous FPGA implementations of CNNs are mainly based on the conventional convolutional algorithm. However, the high arithmetic complexity of conventional convolution algorithm for CNNs restricts the performance of accelerators and significantly increases the challenges of design. It has been proved that the Winograd algorithm for CNNs can effectively reduce the computational complexity. Although a few FPGA approaches based on the Winograd algorithm have been implemented, their works are lake of evaluation on the performance for different tile sizes of the Winograd algorithm. In this work, we focus on exploring the possibility of using the Winograd algorithm to accelerate CNNs on FPGA. First, we propose an accelerator architecture applying to both convolutional layers and fully connected layers. Second, we use high level synthesis tool to expediently implement our design. Finally, we evaluate our accelerator with different tile sizes in terms of resource utilization, performance and efficiency. On VUS440 platform, we achieve an average 943 GOPS for overall VGG16 under low resource utilization, which reaches higher efficiency than the state-of-the-art works on FPGAs.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Corrigendum to "A Composite Model of Wound Segmentation Based on Traditional Methods and Deep Neural Networks"

引用

Computational intelligence and neuroscience 2018年第1期2018卷 4967290页

作者： Fangzhao Li Changjian Wang Xiaohui Liu Yuxing Peng Shiyao Jin Science and Technology on Parallel and Distributed Laboratory National University of Defense Technology Changsha China. College of Computer National University of Defense Technology Changsha China.

[This corrects the article DOI: 10.1155/2018/4149103.].

关键词：

来源：评论

学校读者我要写书评

暂无评论

Large-scale virtual machines provisioning in clouds： challenges and approaches

引用

Frontiers of Computer science 2016年第1期10卷 2-18页

作者： Zhaoning ZHANG Dongsheng LI Kui WU National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha 410073 China Computer Science Department University of Victoria Victoria V8W 2Y2 Canada

The scale of global data center market has been explosive in recent years. As the market grows, the demand for fast provisioning of the virtual resources to support elas- tic, manageable, and economical computing over the cloud becomes high. Fast provisioning of large-scale virtual ma- chines （VMs）, in particular, is critical to guarantee quality of service （QoS）. In this paper, we systematically review the existing VM provisioning schemes and classify them in three main categories. We discuss the features and research status of each category, and introduce two recent solutions, VMThunder and VMThunder＋, both of which can provision hundreds of VMs in seconds.

关键词： cloud computing IaaS large scale virtual ma-chine provisioning

来源：评论

学校读者我要写书评

暂无评论

Learning non-local image diffusion for image denoising

arXiv

引用

arXiv 2017年

作者： Qiao, Peng Dou, Yong Feng, Wensen Chen, Yunjin National Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha410073 China School of Automation and Electrical Engineering University of Science and Technology Beijing Beijing

Image diffusion plays a fundamental role for the task of image denoising. Recently proposed trainable nonlinear reaction diffusion (TNRD) model defines a simple but very effective framework for image denoising. However, as the TNRD model is a local model, the diffusion behavior of which is purely controlled by information of local patches, it is prone to create artifacts in the homogenous regions and over-smooth highly textured regions, especially in the case of strong noise levels. Meanwhile, it is widely known that the non-local selfsimilarity (NSS) prior stands as an effective image prior for image denoising, which has been widely exploited in many nonlocal methods. In this work, we are highly motivated to embed the NSS prior into the TNRD model to tackle its weaknesses. In order to preserve the expected property that end-to-end training is available, we exploit the NSS prior by a set of non-local filters, and derive our proposed trainable non-local reaction diffusion (TNLRD) model for image denoising. Together with the local filters and influence functions, the non-local filters are learned by employing loss-specific training. The experimental results show that the trained TNLRD model produces visually plausible recovered images with more textures and less artifacts, compared to its local versions. Moreover, the trained TNLRD model can achieve strongly competitive performance to recent state-of-theart image denoising methods in terms of peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). Copyright © 2017, The Authors. All rights reserved.

关键词： Image denoising

来源：评论

学校读者我要写书评

暂无评论

A distributed Relation Detection Approach in the Internet of Things

引用

Mobile Information Systems 2017年第1期2017卷

作者： Zhu, Weiping Lu, Hongliang Cui, Xiaohui Cao, Jiannong International School of Software Wuhan University Wuhan China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Department of Computing Hong Kong Polytechnic University Kowloon Hong Kong

In the Internet of Things, it is important to detect the various relations among objects for mining useful knowledge. Existing works on relation detection are based on centralized processing, which is not suitable for the Internet of Things owing to the unavailability of a server, one-point failure, computation bottleneck, and moving of objects. In this paper, we propose a distributed approach to detect relations among objects. We first build a system model for this problem that supports generic forms of relations and both physical time and logical time. Based on this, we design the distributed Relation Detection Approach (DRDA), which utilizes a distributed spanning tree to detect relations using in-network processing. DRDA can coordinate the distributed tree-building process of objects and automatically change the depth of the routing tree to a proper value. Optimization among multiple relation detection tasks is also considered. Extensive simulations were performed and the results show that the proposed approach outperforms existing approaches in terms of the energy consumption. © 2017 Weiping Zhu et al.

关键词： Internet of things

来源：评论

学校读者我要写书评

暂无评论

An improved algorithm for reconstructing singular connection in the multi-block cfd applications

arXiv

引用

arXiv 2017年

作者： Yong-Xian, Wang Li-Lun, Zhang Yong-Gang, Che Chuan-Fu, Xu Wei, Liu Hua-Yong, Liu Zheng-Hua, Wang College of Computer National University of Defense Technology Changsha410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China State Key Laboratory of Aerodynamics Mianyang621000 China

In this paper, an improved algorithm is proposed for the reconstruction of singularity connectivity from the available pairwise connections during preprocessing phase. To evaluate the performance of our algorithm, an in-house CFD code, in which high-order finite-difference method for spatial discretization, running on the Tianhe-1A supercomputer is employed. Test cases with a varied amount of mesh points are chosen, and the test results indicate that the improved singular connection reconstruction algorithm can achieve a 2000× speedup at least compared with the naive search method adopt in the former version of our code. Moreover, the parallel efficiency can be benefited from the strategy of local communication based on the new algorithm. Copyright © 2017, The Authors. All rights reserved.

关键词： Finite difference method

来源：评论

学校读者我要写书评

暂无评论

False-positive probability and compression optimization for tree-structured bloom filters

引用

ACM Transactions on Modeling and Performance Evaluation of Computing Systems 2016年第4期1卷 1–39页

作者： Fu, Yongquan Biersack, Ernst National Key Laboratory for Parallel and Distributed Processing College of Computer Science National University of Defense Technology Sanyi Road Changsha Hunan Province410073 China CAIPY Valbonne06560 France

Bloom filters are frequently used to to check the membership of an item in a set. However, Bloom filters face a dilemma: the transmission bandwidth and the accuracy cannot be optimized simultaneously. This dilemma is particularly severe for transmitting Bloom filters to remote nodes when the network bandwidth is limited. We propose a novel Bloom filter called BloomTree that consists of a tree-structured organization of smaller Bloom filters, each using a set of independent hash functions. BloomTree spreads items across levels that are compressed to reduce the transmission bandwidth need. We show how to find optimal configurations for BloomTree and investigate in detail by how much BloomTree outperforms the standard Bloom filter or the compressed Bloom filter. Finally, we use the intersection of BloomTrees to predict the set intersection, decreasing the false-positive probabilities by several orders of magnitude compared to both the compressed Bloom filter and the standard Bloom filter. © 2016 ACM 2376-3639/2016/09-ART19 $15.00

关键词： Genetic algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：