检索结果-内蒙古大学图书馆

IEEE International Symposium on parallel and distributed processing with Applications and IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC)

作者： Yan Hou Yijie Wang Xingkong Ma Li Cheng College of Computer National University of Defense Technology Changsha National Laboratory for Parallel and Distributed Processing China

ISBN: (纸本)9781538637913

The application of Support Vector Machine (SVM) over data stream is growing with the increasing real-time processing requirements in classification field, like anomaly detection and real-time image processing. However, the dynamic live data with high volume and fast arrival rate in data streams make it challenging to apply SVM in data stream processing. Existing SVM implementations are mostly designed for batch processing and hardly satisfy the efficiency requirement of stream processing for its inherent complexity. To address the challenges, we propose a high efficiency distributed SVM framework over data stream (HDSVM), which consists of two main algorithms, incremental learning algorithm and distributed algorithm. Firstly, we propose a partial support vectors reserving incremental learning algorithm (PSVIL). By selecting a subset of support vectors based on their distances to classification hyperplane instead of the universal set to update SVM, the algorithm achieves lower time overhead while ensuring accuracy. Secondly, we propose a distribution remaining partition and fast aggregation distributed algorithm (DRPFA) for SVM. The real-time data is partitioned based on the original distribution with clustering instead of random partition, and historical support vectors are partitioned based on their distances to the classification hyperplane. The global hyperplane can be obtained by averaging the parameters of local hyperplanes due to the above partition strategy. Extensive experiments on Apache Storm show that the proposed HDSVM achieve lower time overhead and similar accuracy compared with the state-of-art. Speed-up ratio is increased by 2-8 times within 1% accuracy deviation.

关键词： Support vector machines Training distributed databases distributed algorithms Real-time systems Partitioning algorithms Streaming media

来源：评论

学校读者我要写书评

暂无评论

Detailed and clock-driven simulation for HPC interconnection network

引用

Frontiers of Computer science 2016年第5期10卷 797-811页

作者： Wenhao ZHOU Juan CHEN Chen CUI Qian WANG Dezun DONG Yuhua TANG State Key Laboratory of High Performance Computing School of Computer National University of Defense Technology Changsha 410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China

Performance and energy consumption of high performance computing （HPC） interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form is very important for the research on HPC software and hardware technologies. To effectively evaluate the per- formance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation plat- form, called HPC-NetSim. HPC-NetSim uses application- driven workloads and inherits the characteristics of the de- tailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router＇s on/off states. We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.

关键词： high performance computing clock-driven sim-ulation interconnection network BookSim

来源：评论

学校读者我要写书评

暂无评论

Mechanism of floating body effect mitigation via cutting off source injection in a fully-depleted silicon-on-insulator technology

引用

Chinese Physics B 2016年第3期25卷 283-289页

作者：黄鹏程陈书明陈建军 College of Computer National University of Defense Technology Changsha 410073 China National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha 410073 China

In this paper, the effect of floating body effect （FBE） on a single event transient generation mechanism in fully depleted （FD） silicon-on-insulator （SOI） technology is investigated using three-dimensional technology computer-aided design （3D- TCAD） numerical simulation. The results indicate that the main SET generation mechanism is not carder drift/diffusion but floating body effect （FBE） whether for positive or negative channel metal oxide semiconductor （PMOS or NMOS）. Two stacking layout designs mitigating FBE are investigated as well, and the results indicate that the in-line stacking （IS） layout can mitigate FBE completely and is area penalty saving compared with the conventional stacking layout.

关键词： floating body effect in-line stacking silicon-on-insulator source injection

来源：评论

学校读者我要写书评

暂无评论

Effect of supply voltage and body-biasing on single-event transient pulse quenching in bulk fin field-effect-transistor process

引用

Chinese Physics B 2016年第4期25卷 495-500页

作者：于俊庭陈书明陈建军黄鹏程宋睿强 College of Computer National University of Defense Technology Changsha 410073 China National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha 410073 China

Charge sharing is becoming an important topic as the feature size scales down in fin field-effect-transistor （FinFET） technology. However, the studies of charge sharing induced single-event transient （SET） pulse quenching with bulk FinFET are reported seldomly. Using three-dimensional technology computer aided design （3DTCAD） mixed-mode simulations, the effects of supply voltage and body-biasing on SET pulse quenching are investigated for the first time in bulk FinFET process. Research results indicate that due to an enhanced charge sharing effect, the propagating SET pulse width decreases with reducing supply voltage. Moreover, compared with reverse body-biasing （RBB）, the circuit with forward body-biasing （FBB） is vulnerable to charge sharing and can effectively mitigate the propagating SET pulse width up to 53% at least. This can provide guidance for radiation-hardened bulk FinFET technology especially in low power and high performance applications.

关键词： body-biasing SET pulse quenching charge sharing bulk FinFET process

来源：评论

学校读者我要写书评

暂无评论

Aircraft Detection in Remote Sensing Images via CNN Multi-scale Feature Representation

Aircraft Detection in Remote Sensing Images via CNN Multi-sc...

引用

2017 2nd International Conference on Software, Multimedia and Communication Engineering（SMCE 2017)

作者： Jia-qi WANG Xin NIU Peng ZHANG Yong DOU Fei XIA National Laboratory for Parallel and Distributed Processing National University of Defense Technology Institute of Electronic Information Warfare Naval University of Engineering

Aircraft detection in remote sensing images is an intractable *** current aircraft detection methods have limited representative capabilities and heavy computational *** paper studies how to apply multi-scale feature representation of Convolutional Neural Networks(CNN) to aircraft detection by qualitatively and quantitatively analyzing the performance of Single Shot Detection(SSD) *** first,we find that low-level detectors are not robust enough to detect as the semantic gap *** we propose a data driven hyper-parameter selection method to alleviate this problem by determining appropriate hyper-parameters of sliding window and default box ***,we employ a multi-scale training strategy to enhance low-level predictive ***,we propose an accurate and efficient aircraft detection *** results illustrate that our method could achieve 96.84% AP at 20 FPS on NVIDIA TITAN *** with original SSD method,our proposed approach achived 2.13% AP improvement.

关键词： Aircraft detection Remote sensing Single Shot Detection(SSD) CNN

来源：评论

学校读者我要写书评

暂无评论

Automatic generation of fast BLAS3-GEMM: A portable compiler approach 17

Automatic generation of fast BLAS3-GEMM: A portable compiler...

引用

International Symposium on Code Generation and Optimization (CGO)

作者： Xing Su Xiangke Liao Jingling Xue College of Computer National Laboratory for Parallel and Distributed Processing Changsha China UNSW School of Computer Science and Engineering Sydney NSW Australia

ISBN: (纸本)9781509049318

GEMM is the main computational kernel in BLAS3. Its micro-kernel is either hand-crafted in assembly code or generated from C code by general-purpose compilers (guided by architecture-specific directives or auto-tuning). Therefore, either performance or portability suffers. We present a POrtable Compiler Approach, Poca, implemented in LLVM, to automatically generate and optimize this micro-kernel in an architecture-independent manner, without involving domain experts. The key insight is to leverage a wide range of architecture-specific abstractions already available in LLVM, by first generating a vectorized micro-kernel in the architecture-independent LLVM IR and then improving its performance by applying a series of domain-specific yet architecture-independent optimizations. The optimized micro-kernel drops easily in existing GEMM frameworks such as BLIS and OpenBLAS. Validation focuses on optimizing GEMM in double precision on two architectures. On Intel Sandybridge and AArch64 Cortex-A57, Poca's micro-kernels outperform expert-crafted assembly code by 2.35% and 7.54%, respectively, and both BLIS and OpenBLAS achieve competitive or better performance once their micro-kernels are replaced by Poca's.

关键词： Kernel Optimization Computer architecture Linear algebra Libraries Programming Program processors

来源：评论

学校读者我要写书评

暂无评论

Large-scale virtual machines provisioning in clouds： challenges and approaches

引用

Frontiers of Computer science 2016年第1期10卷 2-18页

作者： Zhaoning ZHANG Dongsheng LI Kui WU National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha 410073 China Computer Science Department University of Victoria Victoria V8W 2Y2 Canada

The scale of global data center market has been explosive in recent years. As the market grows, the demand for fast provisioning of the virtual resources to support elas- tic, manageable, and economical computing over the cloud becomes high. Fast provisioning of large-scale virtual ma- chines （VMs）, in particular, is critical to guarantee quality of service （QoS）. In this paper, we systematically review the existing VM provisioning schemes and classify them in three main categories. We discuss the features and research status of each category, and introduce two recent solutions, VMThunder and VMThunder＋, both of which can provision hundreds of VMs in seconds.

关键词： cloud computing IaaS large scale virtual ma-chine provisioning

来源：评论

学校读者我要写书评

暂无评论

HPDedup: A Hybrid prioritized data deduplication mechanism for primary storage in the cloud 33

HPDedup: A Hybrid prioritized data deduplication mechanism f...

引用

33rd International Conference on Massive Storage Systems and technology, MSST 2017

作者： Wu, Huijun Wang, Chen Fu, Yinjin Sakr, Sherif Zhu, Liming Lux, Kai Data CSIRO University of New South Wales Australia PLA University of Science and Technology China Science and Technology on Parallel and Distributed Laboratory State Key Laboratory of High Performance Computing State Key Lab. of High-end Server and Storage Technology Coll. of Computer Natl. Univ. of Def. Technol. China

Eliminating duplicate data in primary storage of clouds increases the cost-efficiency of cloud service providers as well as reduces the cost of users for using cloud services. Most existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use postprocessing deduplication running in system idle time to avoid the negative impact on I/O performance. However, neither of them works well in the cloud servers running multiple services or applications for the following two reasons: Firstly, the temporal locality of duplicate data writes may not exist in some primary storage workloads thus inline caching often fails to achieve good deduplication ratio. Secondly, the post-processing deduplication allows duplicate data to be written to disks, therefore does not provide the benefit of I/O deduplication and requires high peak storage capacity. This paper presents HPDedup, a Hybrid Prioritized data Deduplication mechanism to deal with the storage system shared by applications running in co-located virtual machines or containers by fusing an inline and a post-processing process for exact deduplication. In the inline deduplication phase, HPDedup gives a fingerprint caching mechanism that estimates the temporal locality of duplicates in data streams from different VMs or applications and prioritizes the cache allocation for these streams based on the estimation. HPDedup also allows different deduplication threshold for streams based on their spatial locality to reduce the disk fragmentation. The post-processing phase removes duplicates whose fingerprints are not able to be cached due to weak temporal locality from disks. The hybrid deduplication mechanism significantly reduces the amount of redundant data written to the storage system while maintaining inline data writing performance. Our experimental results show that HPDedup clearly outperforms the state-of-the-art primary storage deduplication techniques in terms of inline cac

关键词： Efficiency

来源：评论

学校读者我要写书评

暂无评论

Learning non-local image diffusion for image denoising

arXiv

引用

arXiv 2017年

作者： Qiao, Peng Dou, Yong Feng, Wensen Chen, Yunjin National Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha410073 China School of Automation and Electrical Engineering University of Science and Technology Beijing Beijing

Image diffusion plays a fundamental role for the task of image denoising. Recently proposed trainable nonlinear reaction diffusion (TNRD) model defines a simple but very effective framework for image denoising. However, as the TNRD model is a local model, the diffusion behavior of which is purely controlled by information of local patches, it is prone to create artifacts in the homogenous regions and over-smooth highly textured regions, especially in the case of strong noise levels. Meanwhile, it is widely known that the non-local selfsimilarity (NSS) prior stands as an effective image prior for image denoising, which has been widely exploited in many nonlocal methods. In this work, we are highly motivated to embed the NSS prior into the TNRD model to tackle its weaknesses. In order to preserve the expected property that end-to-end training is available, we exploit the NSS prior by a set of non-local filters, and derive our proposed trainable non-local reaction diffusion (TNLRD) model for image denoising. Together with the local filters and influence functions, the non-local filters are learned by employing loss-specific training. The experimental results show that the trained TNLRD model produces visually plausible recovered images with more textures and less artifacts, compared to its local versions. Moreover, the trained TNLRD model can achieve strongly competitive performance to recent state-of-theart image denoising methods in terms of peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). Copyright © 2017, The Authors. All rights reserved.

关键词： Image denoising

来源：评论

学校读者我要写书评

暂无评论

A distributed Relation Detection Approach in the Internet of Things

引用

Mobile Information Systems 2017年第1期2017卷

作者： Zhu, Weiping Lu, Hongliang Cui, Xiaohui Cao, Jiannong International School of Software Wuhan University Wuhan China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Department of Computing Hong Kong Polytechnic University Kowloon Hong Kong

In the Internet of Things, it is important to detect the various relations among objects for mining useful knowledge. Existing works on relation detection are based on centralized processing, which is not suitable for the Internet of Things owing to the unavailability of a server, one-point failure, computation bottleneck, and moving of objects. In this paper, we propose a distributed approach to detect relations among objects. We first build a system model for this problem that supports generic forms of relations and both physical time and logical time. Based on this, we design the distributed Relation Detection Approach (DRDA), which utilizes a distributed spanning tree to detect relations using in-network processing. DRDA can coordinate the distributed tree-building process of objects and automatically change the depth of the routing tree to a proper value. Optimization among multiple relation detection tasks is also considered. Extensive simulations were performed and the results show that the proposed approach outperforms existing approaches in terms of the energy consumption. © 2017 Weiping Zhu et al.

关键词： Internet of things

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：