检索结果-内蒙古大学图书馆

An Efficient and Flexible Deterministic Framework for Multithreaded Programs

Journal of Computer science & technology 2015年第1期30卷 42-56页

作者：卢凯周旭王小平 Tom Bergan 陈沉 Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China Department of Computer Science and Engineering University of Washington Seattle WA 98195-2350 U.S.A.

Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap- proaches have been proposed, such as deterministic multithreading （DMT） and deterministic replay. However, these sys- tems either are inefficient or target a single purpose, which is not flexible. In this paper, we propose an efficient and flexible deterministic framework for multithreaded programs. Our framework implements determinism in two steps： relaxed determinism and strong determinism. Relaxed determinism solves data races eificiently by using a proper weak memory consistency model. After that, we implement strong determinism by solving lock contentions deterministically. Since we can apply different approaches for these two steps independently, our framework provides a spectrum of deterministic choices, including nondeterministic system （fast）, weak deterministic system （fast and conditionally deterministic）, DMT system, and deternfinistic replay system. Our evaluation shows that the DMT configuration of this framework could even outperform a state-of-the-art DMT system.

关键词： determinism multithreading framework flexible

来源：评论

学校读者我要写书评

暂无评论

Labelwalking nonnegative matrix factorization 40

Labelwalking nonnegative matrix factorization

引用

40th IEEE International Conference on Acoustics, Speech, and Signal processing, ICASSP 2015

作者： Lan, Long Guan, Naiyang Zhang, Xiang Huang, Xuhui Luo, Zhigang Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha China

ISBN: (纸本)9781467369978

Semi-supervised learning (SSL) utilizes plenty of unlabeled examples to boost the performance of learning from limited labeled examples. Due to its great discriminant power, SSL has been widely applied to various real-world tasks such as information retrieval, pattern recognition, and speech separa- tion. Label propagation (LP) is a popular SSL method which propagates labels through the dataset along high density areas defined by unlabeled examples, LP assumes nearby examples should share the same label, thus, it unavoidably pushes the labels to the wrong examples, especially when different la- beled examples are not strictly separated. Seed K-means uses labeled examples to initialize class centers, and avoid getting stuck in poor local optima comparing to traditional K-means, however the hard constraint of each example's membership makes Seed K-means failed in many real world applications. This paper proposes a novel label walking nonnegative matrix factorization method (LWNMF) to handle labeled examples in SSL based on the framework of NMF. LWNMF decomposes the whole dataset into the product of a basis matrix and a coefficient matrix, and to travel labels to unlabeled examples, LWNMF regards the class indicators of labeled examples as their coefficients and iteratively updates both basis matrix and coefficients of unlabeled examples. Since LWNMF learns comprehensive class centroids, labels iteratively walk to unlabeled examples through these significant centroids. © 2015 IEEE.

关键词： K-means Label propagation Nonnegative matrix factorization Semisupervised learning

来源：评论

学校读者我要写书评

暂无评论

Identifying repeated interleavings to improve the efficiency of concurrency bug detection 15th

Identifying repeated interleavings to improve the efficiency...

引用

15th International Conference on Algorithms and Architectures for parallel processing, ICA3PP 2015

作者： Wu, Zhendong Lu, Kai Wang, Xiaoping Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha410073 China

ISBN: (纸本)9783319271392

Detecting concurrency bugs is becoming increasingly important. Many pattern-based concurrency bug detectors focus on the specific types of interleavings that are correlated to concurrency bugs. To detect multiple types of concurrency bugs, general detectors focus on multiple types of interleaving patterns, including data race pattern, atomicity violation pattern, and atomic-set violation pattern. Unfortunately, they suffer from redundant analysis due to repeated interleavings, which may affect the efficiency of concurrency bug detection. Hence, we propose an approach to identify the repeated interleavings. To the best of our knowledge, this is the first approach that can prune repeated interleavings to improve the efficiency of concurrency bug detection. We apply our approach to existing general detectors (PECAN and Maple) to avoid analyzing repeated interleavings. We evaluated the general detectors with and without our approach, respectively. The experimental results show that the bug detection results are not affected. With our approach, the bug detection time of PECAN and Maple are reduced by 40.0% and 44.4%, respectively. Additionally, our approach does not affect the overhead of bug detection, and consumes only a little memory. © Springer International Publishing Switzerland 2015.

关键词： Efficiency

来源：评论

学校读者我要写书评

暂无评论

Hierarchical routing algorithm for ad hoc networks using mobile VMN

引用

International Journal of Autonomous and Adaptive Communications Systems 2016年第1-2期9卷 40-56页

作者： Yan, Guofeng Peng, Yuxing Liu, Junyi Chen, Shuhong School of Computer and Communication Hunan Institute of Engineering Xiangtan411101 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China School of Computer National University of Defense Technology Changsha410073 China School of Information Science and Engineering Central South University Changsha410083 China

One of the most significant challenges introduced by routing protocol in mobile networks is coping with the unpredictable motion and the unreliable behaviour of mobile nodes. In this paper, we present a hierarchical routing algorithm based on virtual mobile node (VMN). A routing path can be found rapidly by exchanging path information between VMNs without accurate topology information. We discuss the routing process and the implementing details of the proposed routing algorithm. Finally, we evaluate the performance of our hierarchical routing algorithm through simulations, and the results show that the number of mobile WAVE generated by routing algorithm is well satisfied with the quality of service (QoS) requirements of all source and destination nodes. Furthermore, we compare the performance on VMN failure and message delivery ratio using hierarchical and non-hierarchical routing approaches, the results show that we can obtain magnitude better performance by HRA-VMN than hierarchical state routing (HSR) and non-hierarchical routing approach. Copyright © 2016 Inderscience Enterprises Ltd.

关键词： Routing algorithms

来源：评论

学校读者我要写书评

暂无评论

Cooperative repair based on tree structure for multiple failures in distributed storage systems with regenerating codes 15

Cooperative repair based on tree structure for multiple fail...

引用

12th ACM International Conference on Computing Frontiers, CF 2015

作者： Pei, Xiaoqiang Wang, Yijie Ma, Xingkong Fu, Yongquan Xu, Fangliang Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Changsha Hunan410073 China

ISBN: (纸本)9781450333580

Regenerating codes have been proposed to achieve an optimal trade-off curve between the amount of storage space and the network traffic for repair. However, existing repair schemes based on regenerating codes are inadequate to meet the requirements of small network traffic cost and high efficiency when repairing multiple failures. In this paper, we propose a cooperative repair scheme based on tree structure for multiple failures with regenerating codes, called CTREE. For generality, we propose a two-layer repair framework to support both repairs for single and multiple failures. For high repair efficiency, a parallel tree-structured data transmission technique is proposed to organize the data transmissions between the providers and newcomers. For small network network traffic cost, a core-based data exchange technique is proposed to organize the data exchanges between the coordinator and the other newcomers. To evaluate the performance of CTREE, we conduct experiments on both 30 physical and 200 virtual servers. Numerical analysis and extensive experiments confirm that CTREE can support both single and multiple failure repairs, significantly reduces the network traffic cost and improves the repair efficiency compared with the state-of-the-art approaches under various parameter settings. © Copyright 2015 ACM.

关键词： Repair

来源：评论

学校读者我要写书评

暂无评论

Deadline-oriented task scheduling for mapreduce environments 15th

Deadline-oriented task scheduling for mapreduce environments

引用

15th International Conference on Algorithms and Architectures for parallel processing, ICA3PP 2015

作者： Hu, Minghao Wang, Changjian You, Pengfei Huang, Zhen Peng, Yuxing National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha410072 China

ISBN: (纸本)9783319271217

To provide timely results for ‘Big Data Analytics’, it is crucial to satisfy deadline requirements for MapReduce jobs in production environments. In this paper, we propose a deadline-oriented task scheduling approach, named Dart, to meet the given deadline and maximize the input size if only part of the dataset can be processed before the time limit. Dart uses an iterative estimation method which is based on both historical data and job running status to precisely estimate the realtime job completion time. By comparing the estimated time with the deadline constraint, a YARN-based task scheduler dynamically decides whether continuing or terminating the map *** have validated our approach using workloads from OpenCloud and Facebook on a cluster of 60 virtual machines. The results show that Dart can not only effectively meet the deadline but also process near-maximal data volumes even when the deadline is set to be extremely small and limited resources are allocated. © Springer International Publishing Switzerland 2015.

关键词： MapReduce

来源：评论

学校读者我要写书评

暂无评论

parallel pipeline implementation for moving objects detection in traffic video analysis on a heterogeneous platform 4th

Parallel pipeline implementation for moving objects detectio...

引用

4th International Conference on Computer Engineering and Networks, CENet2014

作者： Li, Teng Dou, Yong Jiang, Jingfei Qiao, Peng Department of Computer Science National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha410073 China

ISBN: (纸本)9783319111032

Moving objects detection is important in traffic video analysis, and many algorithms are being increasingly applied to moving objects detection. Most of these algorithms are time-consuming and cannot satisfy real-time demand in traffic video analysis by using a conventional central processing unit (CPU) sequential method. The emergence of the graphics processing unit (GPU) and multi-core CPU provides a method to accelerate the aforementioned algorithms to meet demand. In this study, we provide a GPU-accelerated implementation of the background subtraction algorithm and the morphological operation algorithm. Then, the connected component labeling algorithm is parallelized on a multi-core CPU with Open Multi-processing (OpenMP). Furthermore, parallel pipeline implementation on a heterogeneous platform is proposed by integrating the aforementioned algorithms. Experimental results show that the proposed implementation achieves a significant speedup of up to 5×, compared with sequential implementation on a CPU. © Springer International Publishing Switzerland 2015.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

Partial clones for stragglers in MapReduce

Partial clones for stragglers in MapReduce

引用

International Conference of Young Computer Scientists, Engineers and Educators, ICYCSEE 2015

作者： Li, Jia Wang, Changjian Li, Dongsheng Huang, Zhen National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha410073 China

ISBN: (纸本)9783662462478

Stragglers can temporize jobs and reduce cluster efficiency seriously. Many researches have been contributed to the solution, such as Blacklist[8], speculative execution[1, 6], Dolly[8]. In this paper, we put forward a new approach for mitigating stragglers in MapReduce, name Hummer. It starts task clones only for high-risk delaying tasks. Related experiments have been carried and results show that it can decrease the job delaying risk with fewer resources consumption. For small jobs, Hummer also improves job completion time by 48% and 10% compared to LATE and Dolly. © Springer-Verlag Berlin Heidelberg 2015.

关键词： MapReduce

来源：评论

学校读者我要写书评

暂无评论

Software ranking and analysis based on mining market requirements and characteristics 15

Software ranking and analysis based on mining market require...

引用

7th Asia-Pacific Symposium on Internetware, Internetware 2015

作者： Liu, Bingxun Yin, Gang Wang, Tao Zhang, Fang Wang, Huaimin National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha410073 China

ISBN: (纸本)9781450336413

As the rapid growth of open source software, how to choose software from many alternatives becomes a great challenge. Traditional ranking approaches mainly focus on the characteristics of the software themselves, such as qualities, security, reliable and so on. In this paper we investigate the market demands for software engineers, and propose a novel approach for ranking software by analyzing the market requirements for special software. At the same time we conclude the characteristics of software advertisements and analyze the reasons that why these situations emerge and tendency of software market requirements. As industries always need to balance several different factors for selecting software, the market demands can be a good indicator for ranking software and software evaluating. This paper provides quite a different perspective and some interesting inferences on software market requirements, and it can be a valuable supplement for traditional ranking methods, as well as software evaluating. © 2015 ACM.

关键词： Open source software

来源：评论

学校读者我要写书评

暂无评论

Locality Protected Dynamic Cache Allocation Scheme on GPUs

Locality Protected Dynamic Cache Allocation Scheme on GPUs

引用

IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

作者： Yang Zhang Zuocheng Xing Li Zhou Chunsheng Zhu National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China School of Electronic Science and Engineering National University of Defense Technology Changsha China Department of Electrical and Computer Engineering University of British Columbia Vancouver BC Canada

ISBN: (纸本)9781509032068

As we are approaching the exascale era in supercomputing, designing a balanced computer system with powerful computing ability and low energy consumption becomes increasingly important. GPU is a widely used accelerator in most recently applied supercomputers. It adopts massive multithreads to hide long latency and has high energy efficiency. In contrast to its strong computing power, GPUs have few on-chip resources with several MB of fast on-chip memory storage per SM (Streaming Multiprocessors). GPU caches exhibit poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design. Since the severe deficiency in on-chip memory, the benefit of high computing capacity of GPUs is pulled down by the poor cache performance dramatically, which limits system performance and energy-efficiency. In this paper, we put forward a locality protected scheme to make full use of the data locality based on the fixed capacity. We present a Locality Protected method based on instruction PC (LPP) to promote GPU performance. Firstly, we use a PC-based collector to collect the reuse information of each cache line. After getting the dynamic reuse information of the cache line, we take an intelligent cache allocation unit (ICAU) which coordinates the reuse information with LRU (Least Recently Used) replacement policy to find out the cache line with the least locality for eviction. The results show that LPP provides an up to 17.8% speedup and an average of 5.5% improvement over the baseline method.

关键词： Graphics processing units System-on-chip Computer architecture Instruction sets Resource management Optimization Electronic mail

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：