检索结果-内蒙古大学图书馆

IEEE Conference on Industrial Electronics and Applications (ICIEA)

作者： Yannan Yang Yaping Liu Zhihong Liu School of Computer National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9781467386456

The traditional identifier locator split network has many issues such as inflexibility, hard to innovate and difficult to deploy. SDN (Software Defined Network) provides a new direction for designing flexible identifier locator split network. The recent identifier locator split network based on SDN use the OpenFlow switch directly via rewritting the address, which lacks the scalability and utilizes locator address ineffectively. An OpenFlow switch named IDOpenFlow is proposed to support the communication based on identifier. IDOpenFlow switch provides the communication mechanism via encapsulating the packets, which has good scalability and utilizing locator address effectively. IDOpenFlow switch encapsulates and decapsulates packets according flow entries which are installed by SDN controller. Moreover, the prototype system shows that IDOpenFlow effectively supports the communication for both the fixed node and the mobile node. With respect to the issues of software forwarding performance, a high-performance IDOpenFlow switch based on Intel DPDK (which is named A-IDOpenFlow) is proposed. The results of Ixia test tool show that: 1) for packets more than 128 bytes, A-IDOpenFlow switch supports the communication based on identifier at rate of 10Gbit/s; 2) for small packet of 64 bytes, the rate of A-IDOpenFlow is 7.25 times faster than the rate of IDOpenFlow.

关键词： Switches Protocols Software Routing Servers Scalability

来源：评论

学校读者我要写书评

暂无评论

QSobel:A novel quantum image edge extraction algorithm

引用

Science China(Information Sciences) 2015年第1期58卷 107-119页

作者： ZHANG Yi LU Kai GAO YingHui Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology College of Computer National University of Defense Technology College of Electronic Science and Engineering National University of Defense Technology

Edge extraction is an indispensable task in digital image processing. With the sharp increase in the image data, real-time problem has become a limitation of the state of the art of edge extraction *** this paper, QSobel, a novel quantum image edge extraction algorithm is designed based on the flexible representation of quantum image(FRQI) and the famous edge extraction algorithm Sobel. Because FRQI utilizes the superposition state of qubit sequence to store all the pixels of an image, QSobel can calculate the Sobel gradients of the image intensity of all the pixels simultaneously. It is the main reason that QSobel can extract edges quite fast. Through designing and analyzing the quantum circuit of QSobel, we demonstrate that QSobel can extract edges in the computational complexity of O(n2) for a FRQI quantum image with a size of2 n × 2n. Compared with all the classical edge extraction algorithms and the existing quantum edge extraction algorithms, QSobel can utilize quantum parallel computation to reach a significant and exponential ***, QSobel would resolve the real-time problem of image edge extraction.

关键词： edge extraction quantum image processing FRQI Sobel computational complexity

来源：评论

学校读者我要写书评

暂无评论

GPU acceleration of subgraph isomorphism search in large scale graph

引用

Journal of Central South University 2015年第6期22卷 2238-2249页

作者：杨博卢凯高颖慧王小平徐凯 Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology College of Computer National University of Defense Technology Department of Electronic Science and Engineering National University of Defense Technology

A novel framework for parallel subgraph isomorphism on GPUs is proposed, named GPUSI, which consists of GPU region exploration and GPU subgraph matching. The GPUSI iteratively enumerates subgraph instances and solves the subgraph isomorphism in a divide-and-conquer fashion. The framework completely relies on the graph traversal, and avoids the explicit join operation. Moreover, in order to improve its performance, a task-queue based method and the virtual-CSR graph structure are used to balance the workload among warps, and warp-centric programming model is used to balance the workload among threads in a warp. The prototype of GPUSI is implemented, and comprehensive experiments of various graph isomorphism operations are carried on diverse large graphs. The experiments clearly demonstrate that GPUSI has good scalability and can achieve speed-up of 1.4–2.6 compared to the state-of-the-art solutions.

关键词： parallel graph isomorphism GPU backtrack paradigm

来源：评论

学校读者我要写书评

暂无评论

Hierarchical routing algorithm for ad hoc networks using mobile VMN

引用

International Journal of Autonomous and Adaptive Communications Systems 2016年第1-2期9卷 40-56页

作者： Yan, Guofeng Peng, Yuxing Liu, Junyi Chen, Shuhong School of Computer and Communication Hunan Institute of Engineering Xiangtan411101 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China School of Computer National University of Defense Technology Changsha410073 China School of Information Science and Engineering Central South University Changsha410083 China

One of the most significant challenges introduced by routing protocol in mobile networks is coping with the unpredictable motion and the unreliable behaviour of mobile nodes. In this paper, we present a hierarchical routing algorithm based on virtual mobile node (VMN). A routing path can be found rapidly by exchanging path information between VMNs without accurate topology information. We discuss the routing process and the implementing details of the proposed routing algorithm. Finally, we evaluate the performance of our hierarchical routing algorithm through simulations, and the results show that the number of mobile WAVE generated by routing algorithm is well satisfied with the quality of service (QoS) requirements of all source and destination nodes. Furthermore, we compare the performance on VMN failure and message delivery ratio using hierarchical and non-hierarchical routing approaches, the results show that we can obtain magnitude better performance by HRA-VMN than hierarchical state routing (HSR) and non-hierarchical routing approach. Copyright © 2016 Inderscience Enterprises Ltd.

关键词： Routing algorithms

来源：评论

学校读者我要写书评

暂无评论

A data-driven mechanism for large-scale data distribution

A data-driven mechanism for large-scale data distribution

引用

Proceedings of the Biannual World Automation Congress

作者： Peichang Shi Yiying Li Bo Ding Longquan Jiang Hui Liu Jie Zhang National Key Laboratory of Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha Hunan CN National Key Laboratory of Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha 410073 China National University of Defense Technology Changsha Hunan CN China Electr. Equip. & Syst. Eng. Co. Ltd. China

As The integration of Physical space and cyberspace, the large-scale data distributing to diversification terminal which is geographical distribution of mass has become a huge challenge. When the data size can't be processed by the technology for traditional scope, how to deal with the user quality of service and efficient use of system resources has become an important issue of concern, with the resources becoming limited. This paper presents a data-driven mechanism for large-scale data distribution which is consists of four core part of the data production, data collection and pre-processing, data analysis engine, data consumption, aims to excavate the valuable information to improve the efficiency of resource use and accurate fault location for the Large-scale data distribution system. At the same time, this paper studies the resource scheduling optimization with analyzing data driven for the system behavior and Fault location with analyzing data-driven environment, which proves the effectiveness for the operation of the Large-scale data distribution system optimization by the data-driven working.

关键词： Servers Monitoring distributed databases Big data Real-time systems Business

来源：评论

学校读者我要写书评

暂无评论

Maximizing Uniform Multicast Throughput in Multi-Channel Dense Wireless Sensor Networks 12

Maximizing Uniform Multicast Throughput in Multi-Channel Den...

引用

12th International Conference on Mobile Ad-Hoc and Sensor Networks, MSN 2016

作者： Jiao, Xianlong Chen, Guirong Wang, Xiaodong Chen, Yuli Yang, Li Information and Navigation College Air Force Engineering University Xi'an710077 China College of Information System and Management National University of Defense Technology Changsha410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China Chongqing Guanyinqiao Elementary School Chongqing400020 China Chongqing Liangjiangxinqu Renhe Experimental School Chongqing400021 China

ISBN: (纸本)9781509056965

This paper investigates the problem of maximizing uniform multicast throughput (MUMT) for multi-channel dense wireless sensor networks, where all nodes locate within one-hop transmission range and can communicate with each other on multiple orthogonal channels. This kind of networks show wide application in the real world, and maximizing uniform multicast throughput for these networks is worth deep studying. Previous researches have proved MUMT problem is NP-hard. However, previous researches are either hard to implement, or use too many relay nodes to complete the multicast task, and thus incur high overhead or poor performance. To efficiently solve MUMT problem, we adopt the concept of the maximum independent set with the size constraint, and present one novel Single-Broadcast based Multicast algorithm called SBM based on the concept. We prove that SBM algorithm achieves a constant ratio to the theoretical throughput upper bound. Extensive experimental results demonstrate that, SBM performs better than existing work in terms of both the uniform multicast throughput and the total number of transmissions. © 2016 IEEE.

关键词： Throughput

来源：评论

学校读者我要写书评

暂无评论

Improving performance portability for GPU-specific Open CL kernels on multi-core/many-core CPUs by analysis-based transformations

引用

Frontiers of Information Technology & Electronic engineering 2015年第11期16卷 899-916页

作者： Mei WEN Da-fei HUANG Chang-qing XUN Dong CHEN School of Computer National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing

OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL＇s local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by （1） removing all the unwanted local-memory arrays together with the obsolete barrier statements and （2） optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel＇s many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.

关键词： OpenCL Performance portability Multi-core/many-core CPU Analysis-based transformation

来源：评论

学校读者我要写书评

暂无评论

Locality Protected Dynamic Cache Allocation Scheme on GPUs

Locality Protected Dynamic Cache Allocation Scheme on GPUs

引用

IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

作者： Yang Zhang Zuocheng Xing Li Zhou Chunsheng Zhu National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China School of Electronic Science and Engineering National University of Defense Technology Changsha China Department of Electrical and Computer Engineering University of British Columbia Vancouver BC Canada

ISBN: (纸本)9781509032068

As we are approaching the exascale era in supercomputing, designing a balanced computer system with powerful computing ability and low energy consumption becomes increasingly important. GPU is a widely used accelerator in most recently applied supercomputers. It adopts massive multithreads to hide long latency and has high energy efficiency. In contrast to its strong computing power, GPUs have few on-chip resources with several MB of fast on-chip memory storage per SM (Streaming Multiprocessors). GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design. Since the severe deficiency in on-chip memory, the benefit of high computing capacity of GPUs is pulled down by the poor cache performance dramatically, which limits system performance and energy-efficiency. In this paper, we put forward a locality protected scheme to make full use of the data locality based on the fixed capacity. We present a Locality Protected method based on instruction PC (LPP) to promote GPU performance. Firstly, we use a PC-based collector to collect the reuse information of each cache line. After getting the dynamic reuse information of the cache line, we take an intelligent cache allocation unit (ICAU) which coordinates the reuse information with LRU (Least Recently Used) replacement policy to find out the cache line with the least locality for eviction. The results show that LPP provides an up to 17.8% speedup and an average of 5.5% improvement over the baseline method.

关键词： Graphics processing units System-on-chip computer architecture Instruction sets Resource management Optimization Electronic mail

来源：评论

学校读者我要写书评

暂无评论

Deadline-oriented task scheduling for mapreduce environments 15th

Deadline-oriented task scheduling for mapreduce environments

引用

15th International Conference on Algorithms and Architectures for parallel processing, ICA3PP 2015

作者： Hu, Minghao Wang, Changjian You, Pengfei Huang, Zhen Peng, Yuxing National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha410072 China

ISBN: (纸本)9783319271217

To provide timely results for ‘Big Data Analytics’, it is crucial to satisfy deadline requirements for MapReduce jobs in production environments. In this paper, we propose a deadline-oriented task scheduling approach, named Dart, to meet the given deadline and maximize the input size if only part of the dataset can be processed before the time limit. Dart uses an iterative estimation method which is based on both historical data and job running status to precisely estimate the realtime job completion time. By comparing the estimated time with the deadline constraint, a YARN-based task scheduler dynamically decides whether continuing or terminating the map *** have validated our approach using workloads from OpenCloud and Facebook on a cluster of 60 virtual machines. The results show that Dart can not only effectively meet the deadline but also process near-maximal data volumes even when the deadline is set to be extremely small and limited resources are allocated. © Springer International Publishing Switzerland 2015.

关键词： MapReduce

来源：评论

学校读者我要写书评

暂无评论

Partial clones for stragglers in MapReduce

Partial clones for stragglers in MapReduce

引用

International Conference of Young computer Scientists, Engineers and Educators, ICYCSEE 2015

作者： Li, Jia Wang, Changjian Li, Dongsheng Huang, Zhen National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha410073 China

ISBN: (纸本)9783662462478

Stragglers can temporize jobs and reduce cluster efficiency seriously. Many researches have been contributed to the solution, such as Blacklist[8], speculative execution[1, 6], Dolly[8]. In this paper, we put forward a new approach for mitigating stragglers in MapReduce, name Hummer. It starts task clones only for high-risk delaying tasks. Related experiments have been carried and results show that it can decrease the job delaying risk with fewer resources consumption. For small jobs, Hummer also improves job completion time by 48% and 10% compared to LATE and Dolly. © Springer-Verlag Berlin Heidelberg 2015.

关键词： MapReduce

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：