检索结果-内蒙古大学图书馆

MPtostream:an OpenMP compiler for CPU-GPU heterogeneous parallel systems

Science China(Information Sciences) 2012年第9期55卷 1961-1971页

作者： YANG XueJun,TANG Tao ,WANG GuiBin,JIA Jia & XU XinHai National laboratory for parallel and distributed processing,National University of Defense Technology,Changsha 410073,China 1. National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha 410073 China

In light of GPUs’ powerful floating-point operation capacity,heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC).However,due to the complexity of programming on GPUs,porting a large number of existing scientific computing applications to the heterogeneous parallel systems remains a big *** OpenMP programming interface is widely adopted on multi-core CPUs in the field of scientific *** effectively inherit existing OpenMP applications and reduce the transplant cost,we extend OpenMP with a group of compiler directives,which explicitly divide tasks among the CPU and the GPU,and map time-consuming computing fragments to run on the GPU,thus dramatically simplifying the *** have designed and implemented MPtoStream,a compiler of the extended OpenMP for AMD’s stream processing *** experimental results show that programming with the extended directives deviates from programming with OpenMP by less than 11% modification and achieves significant speedup ranging from 3.1 to 17.3 on a heterogeneous system,incorporating an Intel Xeon E5405 CPU and an AMD FireStream 9250 GPU,over the execution on the Xeon CPU alone.

关键词： GPGPU stream OpenMP compiler

来源：评论

学校读者我要写书评

暂无评论

Comparison of heavy-ion induced SEU for D- and TMR-flip-flop designs in 65-nm bulk CMOS technology

引用

Science China(Information Sciences) 2014年第10期57卷 223-229页

作者： HE YiBai CHEN ShuMing School of Computer Science National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to the standard DFF design in static test mode. In dynamic test mode, TMRFF shows much stronger frequency dependency than the DFF design, which reduces its advantage over DFF at higher operation frequency. At 160 MHz, the TMRFF is only 3.2× harder than the standard DFF. Such small improvement in the SEU performance of the TMR design may warrant reconsideration for its use in hardening design.

关键词： SEU flip-flop TMR heavy-ion frequency

来源：评论

学校读者我要写书评

暂无评论

Heterogeneity-aware Peak Power Management for Accelerator-based Systems

Heterogeneity-aware Peak Power Management for Accelerator-ba...

引用

17th IEEE International Conference on parallel and distributed Systems (ICPADS)

作者： Wang, Guibin Lin, Yisong National Laboratory for Parallel and Distributed Processing National University of Defense Technology China

ISBN: (纸本)9780769545769

Power management has become one of the first-order considerations in high performance computing field. Many recent studies focus on optimizing the performance of a computer system within a given power budget. However, most existing solutions adopt fixed period control mechanism and are transparent to the running applications. Although the application-transparent control mechanism has relatively good portability, it exhibits low efficiency in accelerator-based heterogeneous parallel systems. In typical accelerator-based parallel systems, different processing units have largely different processing speeds and power consumption. Under a given power constraint, how to choose the processor to be slowed down and how to schedule a parallel task onto different processors for the maximum performance are different from those in homogeneous systems and have not been well studied. From the motivating example in this paper, we could find that in order to efficiently harness the heterogeneous parallel processing, one should not only perform dynamic voltage/frequency scaling (DVFS) to meet the power budget, but also tune the parallel task scheduling to adapt to the changes. In this paper, we propose a heterogeneity-aware peak power management, which extends existing application-transparent power controller with an application-aware power controller. Firstly, we theoretically analyze the conditions for the maximum performance given a power budget for heterogeneous systems. Based on this result, we provide a power-constrained parallel task partition algorithm, which coordinates parallel task partition and voltage scaling for heterogeneous processing units to achieve the optimal performance given a system power budget. Finally, we evaluate the proposed method on a typical CPU-GPU heterogeneous system, and validate the superiority of application-aware power controller over the existing method.

关键词： Accelerator-based Systems Peak Power Management GPU

来源：评论

学校读者我要写书评

暂无评论

A fast successive over-relaxation algorithm for force-directed network graph drawing

引用

Science China(Information Sciences) 2012年第3期55卷 677-688页

作者： WANG YongXian & WANG ZhengHua National Key laboratory for parallel and distributed processing, National University of Defense Technology, Changsha 410073, China 1. National Key Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha 410073 China

Force-directed approach is one of the most widely used methods in graph drawing research. There are two main problems with the traditional force-directed algorithms. First, there is no mature theory to ensure the convergence of iteration sequence used in the algorithm and further, it is hard to estimate the rate of convergence even if the convergence is satisfied. Second, the running time cost is increased intolerablely in drawing largescale graphs, and therefore the advantages of the force-directed approach are limited in practice. This paper is focused on these problems and presents a sufficient condition for ensuring the convergence of iterations. We then develop a practical heuristic algorithm for speeding up the iteration in force-directed approach using a successive over-relaxation (SOR) strategy. The results of computational tests on the several benchmark graph datasets used widely in graph drawing research show that our algorithm can dramatically improve the performance of force-directed approach by decreasing both the number of iterations and running time, and is 1.5 times faster than the latter on average.

关键词： graph drawing graph layout successive over-relaxation force-directed algorithm

来源：评论

学校读者我要写书评

暂无评论

Mobility of internet-based virtual computing environment

Mobility of internet-based virtual computing environment

引用

15th International Conference on parallel and distributed Systems, ICPADS '09

作者： Shen, Siqi Wang, Ji Shen, Rui Zhang, Shengdong Fan, Pei National Laboratory for Parallel and Distributed Processing Changsha 410073 China

ISBN: (纸本)9780769539003

The Internet-based Virtual Computing Environment (iVCE) provides on-demand aggregation and autonomic collaboration mechanisms to facilitate the utilization of autonomous and dynamic Internet resources. Load balancing and fault tolerance are important issues when scheduling those transient resources. In this paper, we propose a mobility mechanism for the migration of various roles of agents in the iVCE platform. The mobility mechanism involves two parts of the iVCE platform: role container layer and event service layer. At the role container layer, a novel approach is proposed to handle the code and data mobility issue. At the event service layer, an efficient routing reconfiguration protocol is proposed based on a publish/subscribe system over DHTs to facilitate task migrations. Certain conditions must be satisfied before the migration of an agent to ensure the correctness of the whole process. Experiments are conducted to evaluate the performance of the mobility mechanism, and the experimental results show that it is suitable for implementing load balancing and fault tolerance in the iVCE. © 2009 IEEE.

关键词： Fault tolerance

来源：评论

学校读者我要写书评

暂无评论

PS-SIM: An Execution-Driven Performance Simulation Technology Based on Process-Switch

PS-SIM: An Execution-Driven Performance Simulation Technolog...

引用

International Conference on Advances in Computer Science, Environment, Ecoinformatics, and Education

作者： Guo, Xiaowei Lin, Yufei Xu, Xinhai Zhang, Xin National Laboratory for Parallel and Distributed Processing National University of Defense Technology China

ISBN: (纸本)9783642233234

Nowadays, the performance of large-scale parallel computer system improves continuously, and the system scale becomes extremely large. Performance prediction has become an important approach to guide system design, implementation and optimization. Simulation method is the most widely used performance prediction technology for large-scale parallel computer system. In this paper, after analyzing the extant problems, we proposed a novel execution-driven performance simulation technology based on process-switch. We designed a simulation framework named PS-SIM, and implemented a prototype system based on MPICH2. Finally, we verified the proposed approach by experiments. Experimental results show that the approach has high accuracy and simulation performance.

关键词： Performance Prediction Large-Scale parallel Computer System Execution-Driven Simulation MPICH

来源：评论

学校读者我要写书评

暂无评论

Performance Evaluation of Different Data Value Prediction Schemes

引用

Journal of Computer Science & Technology 2005年第5期20卷 615-623页

作者： Yong Xiao Xing-Ming Zhou National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha 410073 P.R. China

Data value prediction has been widely accepted as an effective mechanism to break data hazards for high performance processor design. Several works have reported promising performance potential. However, there is hardly enough information that is presented in a clear way about performance comparison of these prediction mechanisms. This paper investigates the performance impact of four previously proposed value predictors, namely last value predictor, stride value predictor, two-level value predictor and hybrid （stride-t-two-level） predictor. The impact of misprediction penalty, which has been frequently ignored, is discussed in detail. Several other implementation issues, including instruction window size, issue width and branch predictor are also addressed and simulated. Simulation results indicate that data value predictors act differently under different configurations. In some cases, simpler schemes may be more beneficial than complicated ones. In some particular cases, value prediction may have negative impact on performance.

关键词： data value predictors performance impact simulation

来源：评论

学校读者我要写书评

暂无评论

Computing Must and May Alias to Detect Null Pointer Dereference

Computing Must and May Alias to Detect Null Pointer Derefere...

引用

作者： Ma, Xiaodong Wang, Ji Dong, Wei National Laboratory for Parallel and Distributed Processing China

ISBN: (纸本)3540884785

This paper presents a novel algorithm to detect null pointer dereference errors. The algorithm utilizes both of the must and may alias information in a compact way to improve the precision of the detection. Using may alias information obtained by a fast flow- and context- insensitive analysis algorithm, we compute the must alias generated by the assignment statements and the must alias information is also used to improve the precision of the may alias. We can strong update more expressions using the must alias information, which will reduce the false positives of the detection for null pointer dereference. We have implemented our algorithm in the SUIF2 compiler infrastructure and the experiments results are as expected. © Springer-Verlag Berlin Heidelberg 2008.

关键词： Information use

来源：评论

学校读者我要写书评

暂无评论

Improve OpenMP performance by extending BARRIER and REDUCTION constructs

引用

5th International Symposium on High Performance Computing, ISHPC 2003

作者： Chun, Huang Xuejun, Yang National Laboratory for Parallel and Distributed Processing China

ISBN: (纸本)3540203591

Barrier synchronization and reduction are global operations used frequently in large scale OpenMP programs. To improve OpenMP performance, we present two new directives BARRIER(0) and ALLREDUCTION to extend BARRIER and REDUCTION constructs in OpenMP API. The new extensions have been implemented on our portable OpenMP compiler on JIAJIA. Benchmark testing and experiments show that these constructs decrease the system overheads from synchronization, reduction operation and access of reduction variables on SDSM systems significantly. It is predicable that the improvement of performance can be obtained on ccNUMA systems. © Springer-Verlag Berlin Heidelberg 2003.

关键词： Application programming interfaces (API)

来源：评论

学校读者我要写书评

暂无评论

An optimized method for automatic test oracle generation from real-time specification

An optimized method for automatic test oracle generation fro...

引用

10th IEEE International Conference on Engineering of Complex Computer Systems, ICECCS 2005

作者： Wang, Xin Qi, Zhi-Chang Li, Shuhao National Laboratory for Parallel and Distributed Processing Changsha 410073 China

Test oracles are widely used to verify whether a system under test is running as desired. Since the correctness of real-time systems depends on the logical results of the computation and the time when results are produced at the same time, an optimized model checking-based method for test oracles generation is proposed to check if the system traces satisfy their real-time specifications at run time. Inspired by the idea of real-time model checking, the test oracles can be automatically generated from their specifications in the real-time logic MITL[0,d] in a simpler way and modelled by a variant of the Timed Automata. Assertions are chosen to acquire the traces of real-time systems. A case study is presented to demonstrate the usefulness of the method proposed in this paper. © 2005 IEEE.

关键词： Computer software selection and evaluation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：