检索结果-内蒙古大学图书馆

作者： Lu, Kai Zhou, Xu Bergan, Tom Wang, Xiaoping Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China College of Computer National University of Defense Technology Changsha China University of Washington Computer Science and Engineering United States

Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend on its inputs only, which can totally solve the above problem. However, current DMT implementations suffer from a common inefficiency: they use frequent global barriers to enforce a deterministic ordering on memory accesses. In this paper, we eliminate that inefficiency using an execution model we call deterministic lazy release consistency (DLRC). Our execution model uses the Kendo algorithm to enforce a deterministic ordering on synchronization, and it uses a deterministic version of the lazy release consistency memory model to propagate memory updates across threads. Our approach guarantees that programs execute deterministically even when they contain data races. We implemented a DMT system based on these ideas (RFDet) and evaluated it using 16 parallel applications. Our implementation targets C/C++ programs that use POSIX threads. Results show that RFDet gains nearly 2x speedup compared with DThreads-a start-of-the-art DMT system. © 2014 ACM.

关键词： C++ (programming language)

来源：评论

学校读者我要写书评

暂无评论

MTracer: A Trace-Oriented Monitoring Framework for Medium-Scale distributed Systems

MTracer: A Trace-Oriented Monitoring Framework for Medium-Sc...

引用

2014 IEEE 8th International Symposium on Service Oriented System Engineering

作者： Jingwen Zhou Zhenbang Chen Haibo Mi Ji Wang Science and Technology on Parallel and Distributed Processing Laboratory Changsha China

Trace-oriented runtime monitoring is a very effective method to improve the reliability of distributed systems. However, for medium-scale distributed systems, existing trace-oriented monitoring frameworks are either not powerful or efficient enough, or too complex and expensive to deploy and maintain. In this paper, we present MTracer, which is a lightweight trace-oriented monitoring system for medium-scale distributed systems. We have proposed and implemented several optimizations to improve the efficiency of the monitor server in MTracer. A web-based frontend is also provided to visualize a monitored system from different perspectives. We have validated MTracer in a real medium-scale environment. The results indicate that MTracer has a very lower overhead, and can handle more than 4000 events per second.

关键词： Monitoring Servers Optimization Databases Runtime Data mining Reliability

来源：评论

学校读者我要写书评

暂无评论

Location-Aware Multi-user Resource Allocation in distributed Clouds

Location-Aware Multi-user Resource Allocation in Distributed...

引用

10th Annual Conference of Advanced Computer Architecture, ACA 2014

作者： Li, Jiaxin Li, Dongsheng Zheng, Jing Quan, Yong National Key Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China Information Center of Logistics Department Beijing China School of Computer Science National University of Defense Technology Changsha China

ISBN: (纸本)9783662444900

Resource allocation for multi-user across multiple data centers is an important problem in cloud computing environments. Many geographically-distributed users may request virtualized resources simultaneously. And the distances from users to allocated resources have much impact on the quality of service (QoS) in multiple data centers environment. Most existing methods do not take all these factors into account when allocating resources. They usually result in poor runtime performance of users' virtual computing environment and the remarkable difference of users' QoS. In this paper, we propose RAMD, a resource allocation algorithm based on multi-stage decision in multiple data centers. The RAMD algorithm allocate VMs to users, taking into account the correlation and interaction between multiple users, so as to minimize the sum of all users' service distances (i.e. determined by user location and network distance of virtual machines). Experimental results show that the algorithm can effectively deal with the cloud resource allocation for multi-user across multiple data centers. It can improve the runtime performance of users' virtualized resources and reduce the difference of QoS. © Springer-Verlag Berlin Heidelberg 2014.

关键词： Resource allocation

来源：评论

学校读者我要写书评

暂无评论

Accelerating Embarrassingly parallel Algorithm on Intel MIC

Accelerating Embarrassingly Parallel Algorithm on Intel MIC

引用

2014 IEEE International Conference on Progress in Informatics and Computing

作者： Qinglin Wang Jie Liu XiantuoTang Feng Wang Guitao Fu Zuocheng Xing Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

The Embarrassingly parallel(EP) algorithm which is typical of many Monte Carloapplications provides an estimate of the upper achievable limits for double precision performance of parallel supercomputers. Recently, Intel released Many Integrated Core(MIC) architecture as a many-core co-processor. MIC often offers more than 50 cores each of which can run four hardware threads as well as 512-bit vector instructions. In this paper,we describe how the EP algorithm is accelerated effectively on the platforms containing MIC using the offload execution model. The result shows that the efficientimplementation of EP algorithm on MIC can take full advantage of MIC's computational resources and achieves a speedup of 3.06 compared with that on Intel Xeon E5-2670 CPU. Based on the EP algorithm on MIC and an effective task distribution model, the implementation of EP algorithm on a CPU-MIC heterogeneous platform achieves the performance of up to2134.86 Mop/s and 4.04 times speedup compared with that on Intel Xeon E5-2670 CPU.

关键词： NPB embarrassingly parallel algorithm heterogeneous platform many integrated core architecture

来源：评论

学校读者我要写书评

暂无评论

Design and Implementation of distributed Stage DB:A High Performance distributed Key-Value Database

Design and Implementation of Distributed Stage DB:A High Per...

引用

2014 International Conference on Industrial Engineering and Information technology

作者： Hui-jun Wu Kai Lu Gen Li Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

With the development of high performance computing and Web 2.0 applications,unstructured data storage becomes more and more *** RDBMS isn't efficient for big data ***,RDBMS's scalability is ***' expansion often leads to a large scale of data *** paper designs and implements a high performance distributed key-value database,which is distributed Stage *** servers are organized by a consistent hashing ring and distributed with the support of Zookeeper,a distributed service *** has a high single-node read/write *** route information is calculated by clients,which reduces the expense of expansion.

关键词： distributed system,database,key-value,Zookeeper

来源：评论

学校读者我要写书评

暂无评论

Hierarchical categorization of open source software by online profiles

Hierarchical categorization of open source software by onlin...

引用

作者： Wang, Tao Wang, Huaimin Yin, Gang Yang, Cheng Li, Xiang Zou, Peng Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China Department of Computer Science University of Western Ontario London ON N6A 5B7 Canada Academy of Equipment Beijing 100000 China

The large amounts of freely available open source software over the Internet are fundamentally changing the traditional paradigms of software development. Efficient categorization of the massive projects for retrieving relevant software is of vital importance for Internet-based software development such as solution searching, best practices learning and so on. Many previous works have been conducted on software categorization by mining source code or byte code, but were verified on only relatively small collections of projects with coarse-grained categories or clusters. However, Internet-based software development requires finer-grained, more scalable and language-independent categorization approaches. In this paper, we propose a novel approach to hierarchically categorize software projects based on their online profiles. We design a SVMbased categorization framework and adopt a weighted combination strategy to aggregate different types of profile attributes from multiple repositories. Different basic classification algorithms and feature selection techniques are employed and compared. Extensive experiments are carried out on more than 21,000 projects across five repositories. The results show that our approach achieves significant improvements by using weighted combination. Compared to the previous work, our approach presents competitive results with more finer-grained and multi-layered category hierarchy with more than 120 categories. Unlike approaches that use source code or byte code, our approach is more effective for large-scale and languageindependent software categorization. In addition, experiments suggest that hierarchical categorization combined with general keyword-based searching improves the retrieval efficiency and accuracy. Copyright © 2014 The Institute of Electronics, Information and Communication Engineers.

关键词： Open source software

来源：评论

学校读者我要写书评

暂无评论

Privacy preserving for network coding in smart grid 15th

Privacy preserving for network coding in smart grid

引用

15th International Conference on Algorithms and Architectures for parallel processing, ICA3PP 2015

作者： He, Shiming Zeng, Weini Xie, Kun Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation School of Computer and Communication Engineering Hunan Province Engineering Research Center of Electric Transportation and Smart Distributed Network Changsha University of Science and Technology Changsha410114 China The 716th Research Institute China Shipbuilding Industry Corporation Lianyungang222061 China College of Computer Science and Electronics Engineering Hunan University Changsha410082 China Department of Electrical and Computer Engineering State University of New York at Stony Brook New York United States

ISBN: (纸本)9783319271361

In smart grid, privacy implications to individuals and their family is an important issue, due to the fine-grained usage data collection. Wireless communications are considered by many utility companies to obtain information. Network coding is exploited in smart grids, to enhance network performance in terms of throughput, delay, robustness, and energy consumption. However, Network Coding introduces new challenge for privacy preserving due to the encoding of packet in forwarder nodes. We propose a distributed privacy preserving scheme for network coding in smart grid, which considers the converged flows character of smart grid and exploits a homomorphic encryption function to decrease the complex in forwarder node. The message content of packet is encrypted and the tag of packet is encrypted by homomorphic encryption function. Then the forwarder node linear random codes the encrypted message contents and directly processes the tags cryptotext based on the homomorphism feature. It offers message content confidentiality privacy preserving feature, which can efficiently thwart traffic analysis. Extensive security analysis and performance evaluations demonstrate the validity and efficiency of the proposed scheme. © Springer International Publishing Switzerland 2015.

关键词： Smart power grids

来源：评论

学校读者我要写书评

暂无评论

Realization and optimization DGEMM on ARMv8 64-bit multi-core processor

引用

Dongbei Daxue Xuebao/Journal of Northeastern University 2014年 35卷 37-43页

作者： Jiang, Hao Wang, Feng Zuo, Ke Li, Kuan Yang, Can-Qun College of Computer Science National University of Defense Technology Changsha410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha410073 China

The double-precision matrix-matrix multiplication (DGEMM) on ARMv8 64-bit multi-core processor architecture was realized and optimized, and the optimal model for the purpose of maximizing the compute-to-memory access ratio was built to design DGEMM kernel. The ARM 64-bit memory accessing instruction, Cache pre-fetching instruction and NEON vector FMA instruction were utilized through instruction reordering and loop unrolling to construct the kernel assembly codes. The blocking and packing algorithms and parallel methods from GotoBLAS (OpenBLAS) were chosen, and the results showed that the floating-point peak efficiency can achieve 82% with one thread and 80% with eight threads, respectively. As the fastest DGEMM implementation on ARMv8 64-bit processor, it improves the peak performance by 8.3% and 16.7% compared to ATLAS. ©, 2014, Northeastern University. All right reserved.

关键词： Digital arithmetic

来源：评论

学校读者我要写书评

暂无评论

The acceleration of turbo decoder on the newest GPGPU of Kepler architecture

The acceleration of turbo decoder on the newest GPGPU of Kep...

引用

International Symposium on Communications and Information Technologies (ISCIT)

作者： Yang Zhang Zuocheng Xing Luechao Yuan Cang Liu Qinglin Wang Science and technology on Parallel and distributed processing laboratory National University of Defense Technology ChangSha China

ISBN: (纸本)9781479944156

In the paper, a new implementation of a 3GPP LTE standards compliant turbo decoder based on GPGPU is proposed. It uses the newest GPU-Tesla K20c, which is based on the Kepler GK110 architecture. The new architecture has more powerful parallel computing capability and we use it to fully exploit the parallelism in the turbo decoding algorithm in novel ways. Meanwhile, we use various memory hierarchies to meet various kinds of data demands on speed and capacity. Simulation shows that our implementation is practical and it gets 76% improvement on throughput over the latest GPU implementation. The result demonstrates that the newest Kepler architecture is suitable for turbo decoding and it can be a promising reconfigurable platform for the communication system.

关键词： Decoding Graphics processing units parallel processing Computer architecture Throughput Kernel Bit error rate

来源：评论

学校读者我要写书评

暂无评论

Experimental verification of the parasitic bipolar amplification effect in PMOS single event transients

引用

Chinese Physics B 2014年第7期23卷 775-779页

作者： He Yi-Bai Chen Shu-Ming College of Computer National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event charge collection is composed of diffusion, drift, and the parasitic bipolar effect, while for PMOSs in the special layout, the parasitic bipolar junction transistor cannot turn on. Heavy ion experimental results show that PMOSs without parasitic bipolar amplification have a 21.4% decrease in the average SET pulse width and roughly a 40.2% reduction in the SET cross-section.

关键词：重离子实验放大效应单事件双极寄生 PMOS 验证瞬态

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：