检索结果-内蒙古大学图书馆

4th International Conference on Computer Engineering and Networks, CENet2014

作者： Li, Teng Dou, Yong Jiang, Jingfei Qiao, Peng Department of Computer Science National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha410073 China

ISBN: (纸本)9783319111032

Moving objects detection is important in traffic video analysis, and many algorithms are being increasingly applied to moving objects detection. Most of these algorithms are time-consuming and cannot satisfy real-time demand in traffic video analysis by using a conventional central processing unit (CPU) sequential method. The emergence of the graphics processing unit (GPU) and multi-core CPU provides a method to accelerate the aforementioned algorithms to meet demand. In this study, we provide a GPU-accelerated implementation of the background subtraction algorithm and the morphological operation algorithm. Then, the connected component labeling algorithm is parallelized on a multi-core CPU with Open Multi-processing (OpenMP). Furthermore, parallel pipeline implementation on a heterogeneous platform is proposed by integrating the aforementioned algorithms. Experimental results show that the proposed implementation achieves a significant speedup of up to 5×, compared with sequential implementation on a CPU. © Springer International Publishing Switzerland 2015.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

Deadline-oriented task scheduling for mapreduce environments 1

引用

15th International Conference on Algorithms and Architectures for parallel processing, ICA3PP 2015

作者： Hu, Minghao Wang, Changjian You, Pengfei Huang, Zhen Peng, Yuxing National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha410072 China

ISBN: (数字)9783319271224

ISBN: (纸本)9783319271217

To provide timely results for ‘Big Data Analytics’, it is crucial to satisfy deadline requirements for MapReduce jobs in production environments. In this paper, we propose a deadline-oriented task scheduling approach, named Dart, to meet the given deadline and maximize the input size if only part of the dataset can be processed before the time limit. Dart uses an iterative estimation method which is based on both historical data and job running status to precisely estimate the realtime job completion time. By comparing the estimated time with the deadline constraint, a YARN-based task scheduler dynamically decides whether continuing or terminating the map *** have validated our approach using workloads from OpenCloud and Facebook on a cluster of 60 virtual machines. The results show that Dart can not only effectively meet the deadline but also process near-maximal data volumes even when the deadline is set to be extremely small and limited resources are allocated. © Springer International Publishing Switzerland 2015.

关键词： MapReduce

来源：评论

学校读者我要写书评

暂无评论

Hyperspectral image classification via kernel extreme learning machine using local receptive fields

Hyperspectral image classification via kernel extreme learni...

引用

IEEE International Conference on Image processing

作者： Qi Lv Xin Niu Yong Dou Yueqing Wang Jiaqing Xu Jie Zhou College of Computer National University of Defense Technology Changsha China National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China National University of Defense Technology Changsha Hunan CN

This paper proposes a classification approach for hyperspectral image (HSI) using the local receptive fields based kernel extreme learning machine. Extreme learning machine (ELM) has drawn increasing attention in the pattern recognition filed due to its simpleness, speediness and good generalization ability. A kernel method is often used to promote ELM's performance, which is known as kernel ELM. The local receptive field concept originates from research in neuroscience. Considering the local correlations of spectral features, it is promising to improve the performance of HSI classification by combining local receptive fields with kernel ELM. Experimental results on the Pavia University dataset confirm the effectiveness of the proposed HSI classification method.

关键词： Kernel Training Hyperspectral imaging Convolution Neurons Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Hierarchical routing algorithm for ad hoc networks using mobile VMN

引用

International Journal of Autonomous and Adaptive Communications Systems 2016年第1-2期9卷 40-56页

作者： Yan, Guofeng Peng, Yuxing Liu, Junyi Chen, Shuhong School of Computer and Communication Hunan Institute of Engineering Xiangtan411101 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China School of Computer National University of Defense Technology Changsha410073 China School of Information Science and Engineering Central South University Changsha410083 China

One of the most significant challenges introduced by routing protocol in mobile networks is coping with the unpredictable motion and the unreliable behaviour of mobile nodes. In this paper, we present a hierarchical routing algorithm based on virtual mobile node (VMN). A routing path can be found rapidly by exchanging path information between VMNs without accurate topology information. We discuss the routing process and the implementing details of the proposed routing algorithm. Finally, we evaluate the performance of our hierarchical routing algorithm through simulations, and the results show that the number of mobile WAVE generated by routing algorithm is well satisfied with the quality of service (QoS) requirements of all source and destination nodes. Furthermore, we compare the performance on VMN failure and message delivery ratio using hierarchical and non-hierarchical routing approaches, the results show that we can obtain magnitude better performance by HRA-VMN than hierarchical state routing (HSR) and non-hierarchical routing approach. Copyright © 2016 Inderscience Enterprises Ltd.

关键词： Routing algorithms

来源：评论

学校读者我要写书评

暂无评论

Partial clones for stragglers in MapReduce

Partial clones for stragglers in MapReduce

引用

International Conference of Young Computer Scientists, Engineers and Educators, ICYCSEE 2015

作者： Li, Jia Wang, Changjian Li, Dongsheng Huang, Zhen National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha410073 China

ISBN: (纸本)9783662462478

Stragglers can temporize jobs and reduce cluster efficiency seriously. Many researches have been contributed to the solution, such as Blacklist[8], speculative execution[1, 6], Dolly[8]. In this paper, we put forward a new approach for mitigating stragglers in MapReduce, name Hummer. It starts task clones only for high-risk delaying tasks. Related experiments have been carried and results show that it can decrease the job delaying risk with fewer resources consumption. For small jobs, Hummer also improves job completion time by 48% and 10% compared to LATE and Dolly. © Springer-Verlag Berlin Heidelberg 2015.

关键词： MapReduce

来源：评论

学校读者我要写书评

暂无评论

Implementation of a Fine-Grained parallel full pipeline Schnorr–Euchner sphere decoder algorithm accelerator on Field-Programmable gate array 4th

Implementation of a Fine-Grained parallel full pipeline Schn...

引用

4th International Conference on Computer Engineering and Networks, CENet2014

作者： Li, Shijie Guo, Lei Dou, Yong Jiang, Jingfei National laboratory for parallel and distributed processing National University of Defense Technology Changsha410073 China

ISBN: (纸本)9783319111032

A new parallel full pipeline accelerator implemented on fieldprogrammable gate array (FPGA) for the Schnorr–Euchner sphere decoding (SE– SD) algorithm is presented in this paper. We firstly transform the serial SE–SD algorithm into a parallel one. Afterwards, we use multiple processing elements (PEs) to deal with the workload (particularly for tree searching in the SE–SD algorithm) in parallel. Each separated SE–SD search workload is divided averagely. Each PE searches a sub-tree by using a multilevel pipeline to increase the data throughput, and the whole system obtains a batch of different input data chronologically. We select the number of PEs to distribute our system according to the hardware platform by using a distribution unit. We’ve successfully placed four PEs in an accelerator and eight accelerators in a single FPGA (XC6VLX240T). The system obtains remarkable benefit in changing the accelerate mode, including latency- and throughput-prior modes. © Springer International Publishing Switzerland 2015.

关键词： MIMO systems

来源：评论

学校读者我要写书评

暂无评论

Software ranking and analysis based on mining market requirements and characteristics 15

Software ranking and analysis based on mining market require...

引用

7th Asia-Pacific Symposium on Internetware, Internetware 2015

作者： Liu, Bingxun Yin, Gang Wang, Tao Zhang, Fang Wang, Huaimin National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha410073 China

ISBN: (纸本)9781450336413

As the rapid growth of open source software, how to choose software from many alternatives becomes a great challenge. Traditional ranking approaches mainly focus on the characteristics of the software themselves, such as qualities, security, reliable and so on. In this paper we investigate the market demands for software engineers, and propose a novel approach for ranking software by analyzing the market requirements for special software. At the same time we conclude the characteristics of software advertisements and analyze the reasons that why these situations emerge and tendency of software market requirements. As industries always need to balance several different factors for selecting software, the market demands can be a good indicator for ranking software and software evaluating. This paper provides quite a different perspective and some interesting inferences on software market requirements, and it can be a valuable supplement for traditional ranking methods, as well as software evaluating. © 2015 ACM.

关键词： Open source software

来源：评论

学校读者我要写书评

暂无评论

Locality Protected Dynamic Cache Allocation Scheme on GPUs

Locality Protected Dynamic Cache Allocation Scheme on GPUs

引用

IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

作者： Yang Zhang Zuocheng Xing Li Zhou Chunsheng Zhu National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China School of Electronic Science and Engineering National University of Defense Technology Changsha China Department of Electrical and Computer Engineering University of British Columbia Vancouver BC Canada

ISBN: (纸本)9781509032068

As we are approaching the exascale era in supercomputing, designing a balanced computer system with powerful computing ability and low energy consumption becomes increasingly important. GPU is a widely used accelerator in most recently applied supercomputers. It adopts massive multithreads to hide long latency and has high energy efficiency. In contrast to its strong computing power, GPUs have few on-chip resources with several MB of fast on-chip memory storage per SM (Streaming Multiprocessors). GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design. Since the severe deficiency in on-chip memory, the benefit of high computing capacity of GPUs is pulled down by the poor cache performance dramatically, which limits system performance and energy-efficiency. In this paper, we put forward a locality protected scheme to make full use of the data locality based on the fixed capacity. We present a Locality Protected method based on instruction PC (LPP) to promote GPU performance. Firstly, we use a PC-based collector to collect the reuse information of each cache line. After getting the dynamic reuse information of the cache line, we take an intelligent cache allocation unit (ICAU) which coordinates the reuse information with LRU (Least Recently Used) replacement policy to find out the cache line with the least locality for eviction. The results show that LPP provides an up to 17.8% speedup and an average of 5.5% improvement over the baseline method.

关键词： Graphics processing units System-on-chip Computer architecture Instruction sets Resource management Optimization Electronic mail

来源：评论

学校读者我要写书评

暂无评论

Enabling Tissue-Scale Cardiac Simulations Using Heterogeneous Computing on Tianhe-2

Enabling Tissue-Scale Cardiac Simulations Using Heterogeneou...

引用

International Conference on parallel and distributed Systems (ICPADS)

作者： Johannes Langguth Qiang Lan Namit Gaur Xing Cai Mei Wen Chun-Yuan Zhang Simula Research Laboratory Lysaker Norway College of Computer National University of Defense Technology Changsha China National Key Laboratory of Parallel and Distributed Processing Changsha China Department of Informatics University of Oslo Oslo Norway

ISBN: (纸本)9781509053827

We develop a simulator for 3D tissue of the human cardiac ventricle with a physiologically realistic cell model and deploy it on the supercomputer Tianhe-2. In order to attain the full performance of the heterogeneous CPU-Xeon Phi design, we use carefully optimized codes for both devices and combine them to obtain suitable load balancing. Using a large number of nodes, we are able to perform tissue-scale simulations of the electrical activity and calcium handling in millions of cells, at a level of detail that tracks the states of trillions of ryanodine receptors. We can thus simulate arrythmogenic spiral waves and other complex arrhythmogenic patterns which arise from calcium handling deficiencies in human cardiac ventricle tissue. Due to extensive code tuning and parallelization via OpenMP, MPI, and SCIF/COI, large scale simulations of 10 heartbeats can be performed in a matter of hours. Test results indicate excellent scalability, thus paving the way for detailed whole-heart simulations in future generations of leadership class supercomputers.

关键词： Calcium Computational modeling Mathematical model Performance evaluation Hardware Instruction sets Supercomputers

来源：评论

学校读者我要写书评

暂无评论

Improving performance portability for GPU-specific Open CL kernels on multi-core/many-core CPUs by analysis-based transformations

引用

Frontiers of Information technology & Electronic Engineering 2015年第11期16卷 899-916页

作者： Mei WEN Da-fei HUANG Chang-qing XUN Dong CHEN School of Computer National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing

OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL＇s local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by （1） removing all the unwanted local-memory arrays together with the obsolete barrier statements and （2） optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel＇s many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.

关键词： OpenCL Performance portability Multi-core/many-core CPU Analysis-based transformation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：