检索结果-内蒙古大学图书馆

17th IEEE International Conference on parallel and distributed Systems (ICPADS)

作者： Wang, Guibin Lin, Yisong National Laboratory for Parallel and Distributed Processing National University of Defense Technology China

ISBN: (纸本)9780769545769

Power management has become one of the first-order considerations in high performance computing field. Many recent studies focus on optimizing the performance of a computer system within a given power budget. However, most existing solutions adopt fixed period control mechanism and are transparent to the running applications. Although the application-transparent control mechanism has relatively good portability, it exhibits low efficiency in accelerator-based heterogeneous parallel systems. In typical accelerator-based parallel systems, different processing units have largely different processing speeds and power consumption. Under a given power constraint, how to choose the processor to be slowed down and how to schedule a parallel task onto different processors for the maximum performance are different from those in homogeneous systems and have not been well studied. From the motivating example in this paper, we could find that in order to efficiently harness the heterogeneous parallel processing, one should not only perform dynamic voltage/frequency scaling (DVFS) to meet the power budget, but also tune the parallel task scheduling to adapt to the changes. In this paper, we propose a heterogeneity-aware peak power management, which extends existing application-transparent power controller with an application-aware power controller. Firstly, we theoretically analyze the conditions for the maximum performance given a power budget for heterogeneous systems. Based on this result, we provide a power-constrained parallel task partition algorithm, which coordinates parallel task partition and voltage scaling for heterogeneous processing units to achieve the optimal performance given a system power budget. Finally, we evaluate the proposed method on a typical CPU-GPU heterogeneous system, and validate the superiority of application-aware power controller over the existing method.

关键词： Accelerator-based Systems Peak Power Management GPU

来源：评论

学校读者我要写书评

暂无评论

PS-SIM: An Execution-Driven Performance Simulation Technology Based on Process-Switch

PS-SIM: An Execution-Driven Performance Simulation Technolog...

引用

International Conference on Advances in Computer Science, Environment, Ecoinformatics, and Education

作者： Guo, Xiaowei Lin, Yufei Xu, Xinhai Zhang, Xin National Laboratory for Parallel and Distributed Processing National University of Defense Technology China

ISBN: (纸本)9783642233234

Nowadays, the performance of large-scale parallel computer system improves continuously, and the system scale becomes extremely large. Performance prediction has become an important approach to guide system design, implementation and optimization. Simulation method is the most widely used performance prediction technology for large-scale parallel computer system. In this paper, after analyzing the extant problems, we proposed a novel execution-driven performance simulation technology based on process-switch. We designed a simulation framework named PS-SIM, and implemented a prototype system based on MPICH2. Finally, we verified the proposed approach by experiments. Experimental results show that the approach has high accuracy and simulation performance.

关键词： Performance Prediction Large-Scale parallel Computer System Execution-Driven Simulation MPICH

来源：评论

学校读者我要写书评

暂无评论

Detailed and clock-driven simulation for HPC interconnection network

引用

Frontiers of Computer Science 2016年第5期10卷 797-811页

作者： Wenhao ZHOU Juan CHEN Chen CUI Qian WANG Dezun DONG Yuhua TANG State Key Laboratory of High Performance Computing School of Computer National University of Defense Technology Changsha 410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China

Performance and energy consumption of high performance computing （HPC） interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form is very important for the research on HPC software and hardware technologies. To effectively evaluate the per- formance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation plat- form, called HPC-NetSim. HPC-NetSim uses application- driven workloads and inherits the characteristics of the de- tailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router＇s on/off states. We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.

关键词： high performance computing clock-driven sim-ulation interconnection network BookSim

来源：评论

学校读者我要写书评

暂无评论

A hyper-cube based P2P information service for data grid

A hyper-cube based P2P information service for data grid

引用

5th International Conference on Grid and Cooperative Computing, GCC 2006

作者： Ren, Hao Wang, Zhiying Liu, Zhong National Laboratory for Parallel and Distributed Processing NUDT China

ISBN: (纸本)0769526942

There are many researches use peer-to-peer model to organize the Grid Information Service (GIS) and have been testified which be able to improve scalability and reliability of Grid environment. However, Data Grid Information Service (DGIS) has its special requirements and all approaches of PIP model used in GIS cannot be applied to DGIS. In this paper, we propose a new approach for DGIS that imposes a deterministic P2P shape based on hypercube topology, which allows for very efficient query broadcasting. Furthermore, we proposed a transposition algorithm to optimize the overlay network's topology according to the access statistics between peers, making the peers always access each other become neighbor by transposing peer's place. The simulation shows that the transposition algorithm could significant improve searches efficiency. © 2006 IEEE.

关键词： Information services

来源：评论

学校读者我要写书评

暂无评论

Towards a framework for scalable model checking of concurrent C programs

Towards a framework for scalable model checking of concurren...

引用

2nd International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, ISoLA 2006

作者： Ji, Wang Yi, Xiaodong Yang, Xuejun National Laboratory for Parallel and Distributed Processing Changsha China

ISBN: (纸本)0769530710

The paper presents a novel framework for scalable model checking of concurrent C programs. With the idea of verification reuse, it shows an integrated approach to efficient reduction of state space by abstraction, symbolic representation and dynamic partial-order reduction (DPOR) techniques. The framework is founded on an over-approximated model of the concurrent program by variable abstraction, and combines DPOR with lightweight symbolic execution to generate the symbolic conditions for all locations, called α-conditions, which are intended for verification reuse. The α-conditions of a location are weak approximation of the conditions that must be satisfied at that location so as to guarantee the temporal safety properties to be verified. These conditions will be checked for reusing the previous exploration in verification, and will be iteratively refined under the guidance of spurious counterexamples. The presented framework is demonstrated by several experiments including a concurrent software system whose server and client processes are derived from openssl-0.9.6c C source codes implementing the SSL protocol. © 2007 IEEE.

关键词： Model checking

来源：评论

学校读者我要写书评

暂无评论

Two improved GPU acceleration strategies for force-directed graph layout

Two improved GPU acceleration strategies for force-directed ...

引用

International Conference on Computer Application and System Modeling

作者： Wang, Yong-Xian Li, Zong-Zhe Yao, Lu Cao, Wei Wang, Zheng-Hua National Key Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China

ISBN: (纸本)9781424472369

Force directed approach is one of the most widely used methods in graph drawing research. However, the running time is increased intolerablely along with the enlargement of the graph size, which restricts the algorithm's practicability. By the aid of GPU (graphics processing unit) computing platform, we can speed-up the graph layout with low cost, but the existing GPU implementation mainly employees an "one-by-one" style to update the vertex' coordination per iteration, which has a lower convergent rate than the "batch" style which is instead used commonly in traditional CPU implementation. As a result, the aesthetics of graph layout would be decreased if the total running time is restricted. It is hard to achieve both a high speedup factor of GPU over CPU and a high convergent rate in existing GPU computing implementation. In order to solve this problem partially, this paper presents two new strategies to implement the large-scale graph layout on CPU+GPU heteromerous platform to accelerate the force directed layout for graph drawing problem. The numerical computation results show that our GPU implementation can dramatically improve the performance of force-direct layout and is 20 times on a NVIDIA GeForce 9800 GT GPU at 1.44 GHz faster than the one on single-CPU core of Intel Pentium 4 PC at 3.0 GHz for the graph layout with moderate size (typically 1000 vertices). © 2010 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Simulation study of N-hit SET variation in differential cascade voltage switch logical circuits

引用

Science China(Information Sciences) 2015年第2期58卷 165-173页

作者： HUANG PengCheng CHEN ShuMing CHEN JianJun WU ZhenYu LIANG ZhengFa HU ChunMei LIANG Bin LIU BiWei Micro-electronics and Microprocessor Institute College of Computer ScienceNational University of Defense Technology National Laboratory for Parallel and Distributed Processing College of Computer ScienceNational University of Defense Technology

The advancement in the process leads to more concern about the Single Event(SE) sensitivity of the Differential Cascade Voltage Switch Logic(DCVSL) circuits. The simulation results indicate that the Single Event Transient(SET) generated at the DCVSL gate is much larger than that at the ordinary CMOS gate, and their SET variation is different. Based on charge collection, in this paper, the effective collection time theory is proposed to set forth the SET pulse generated at the DCVSL gate. Through 3D TCAD mixed-mode simulation in 65 nm twin-well bulk CMOS process, the effects on SET variation of device parameters such as well contact size and environment parameters such as voltage are investigated.

关键词： differential cascade voltage switch logic(DCVSL) single event transient(SET) effective collection time pulse feedback feature(PFF) across-coupled structure

来源：评论

学校读者我要写书评

暂无评论

Mirror image:newfangled cell-level layout technique for single-event transient mitigation

引用

Chinese Science Bulletin 2014年第23期59卷 2850-2858页

作者： Pengcheng Huang Shuming Chen Zhengfa Liang Jianjun Chen Chunmei Hu Yibai He Micro-electronics and Microprocessor Institute National University of Defense Technology National Laboratory for Parallel and Distributed Processing National University of Defense Technology

Recent years,the hardening of combinational circuits is becoming a common *** the transistor-level hardening technique,the cell-level hardening technique,a divide and conquer strategy,can substantially make use of some typical character in the cell-circuit module to mitigate single event transient(SET)*** mirror image(MI)technique proposed in this paper can adequately enhance the charge sharing in those cell-circuits with stage-by-stage inverter-like structure.3D TCAD mixed-mode simulation have been performed in 65 nm twinwell bulk CMOS process,the results indicate that the MI technique can almost reduce the SET pulse width from the anterior-stage PMOS over 25%,and can mitigate the SET pulse width from the posterior-stage PMOS about 10%.The MI technique,a represent of the cell-level technique,may be the future of the hardening of combinational circuits.

关键词：单事件技术细胞瞬态镜像组合逻辑电路次布 CMOS工艺

来源：评论

学校读者我要写书评

暂无评论

Static Power Optimization for Homogeneous Multiple GPUs Based on Task Partition

Static Power Optimization for Homogeneous Multiple GPUs Base...

引用

2nd International Congress on Computer Applications and Computational Science (CACS 2011)

作者： Lin, Yisong Tang, Tao Wang, Guibin National Laboratory of Parallel and Distributed Processing National University of Defense Technology Changsha China

ISBN: (纸本)9783642283079;9783642283086

Recently, GPU has been widely used in High Performance Computing (HPC). In order to improve computational performance, several GPUs are integrated into one computer node in practical system. However, power consumption of GPUs is very high and becomes as bottleneck to its further development. In doing so, optimizing power consumption have been draw broad attention in the research area and industry community. In this paper, we present an energy optimization model considering performance constraint for homogeneous multi-GPUs, and propose a performance prediction model when task partitioning policy is specified. Experiment results validate that the model can accurately predict the execution of program for single or multiple GPUs, and thus reduce static power consumption by the guide of task partition.

关键词： Electric power utilization

来源：评论

学校读者我要写书评

暂无评论

Implementation of ternary Shor's algorithm based on vibrational states of an ion in anharmonic potential

引用

Chinese Physics B 2015年第3期24卷 157-165页

作者：刘威陈书明张见吴春旺吴伟陈平形 College of Computer National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory (PDL) National University of Defense Technology College of Science National University of Defense Technology

It is widely believed that Shor＇s factoring algorithm provides a driving force to boost the quantum computing ***, a serious obstacle to its binary implementation is the large number of quantum gates. Non-binary quantum computing is an efficient way to reduce the required number of elemental gates. Here, we propose optimization schemes for Shor＇s algorithm implementation and take a ternary version for factorizing 21 as an example. The optimized factorization is achieved by a two-qutrit quantum circuit, which consists of only two single qutrit gates and one ternary controlled-NOT gate. This two-qutrit quantum circuit is then encoded into the nine lower vibrational states of an ion trapped in a weakly anharmonic potential. Optimal control theory（OCT） is employed to derive the manipulation electric field for transferring the encoded states. The ternary Shor＇s algorithm can be implemented in one single step. Numerical simulation results show that the accuracy of the state transformations is about 0.9919.

关键词： ternary Shor's algorithm anharmonic ion trapping optimal control theory vibrational state

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：