检索结果-内蒙古大学图书馆

Simulation study of N-hit SET variation in differential cascade voltage switch logical circuits

Science China(Information Sciences) 2015年第2期58卷 165-173页

作者： HUANG PengCheng CHEN ShuMing CHEN JianJun WU ZhenYu LIANG ZhengFa HU ChunMei LIANG Bin LIU BiWei Micro-electronics and Microprocessor Institute College of Computer ScienceNational University of Defense Technology National Laboratory for Parallel and Distributed Processing College of Computer ScienceNational University of Defense Technology

The advancement in the process leads to more concern about the Single Event(SE) sensitivity of the Differential Cascade Voltage Switch Logic(DCVSL) circuits. The simulation results indicate that the Single Event Transient(SET) generated at the DCVSL gate is much larger than that at the ordinary CMOS gate, and their SET variation is different. Based on charge collection, in this paper, the effective collection time theory is proposed to set forth the SET pulse generated at the DCVSL gate. Through 3D TCAD mixed-mode simulation in 65 nm twin-well bulk CMOS process, the effects on SET variation of device parameters such as well contact size and environment parameters such as voltage are investigated.

关键词： differential cascade voltage switch logic(DCVSL) single event transient(SET) effective collection time pulse feedback feature(PFF) across-coupled structure

来源：评论

学校读者我要写书评

暂无评论

Mirror image:newfangled cell-level layout technique for single-event transient mitigation

引用

Chinese Science Bulletin 2014年第23期59卷 2850-2858页

作者： Pengcheng Huang Shuming Chen Zhengfa Liang Jianjun Chen Chunmei Hu Yibai He Micro-electronics and Microprocessor Institute National University of Defense Technology National Laboratory for Parallel and Distributed Processing National University of Defense Technology

Recent years,the hardening of combinational circuits is becoming a common *** the transistor-level hardening technique,the cell-level hardening technique,a divide and conquer strategy,can substantially make use of some typical character in the cell-circuit module to mitigate single event transient(SET)*** mirror image(MI)technique proposed in this paper can adequately enhance the charge sharing in those cell-circuits with stage-by-stage inverter-like structure.3D TCAD mixed-mode simulation have been performed in 65 nm twinwell bulk CMOS process,the results indicate that the MI technique can almost reduce the SET pulse width from the anterior-stage PMOS over 25%,and can mitigate the SET pulse width from the posterior-stage PMOS about 10%.The MI technique,a represent of the cell-level technique,may be the future of the hardening of combinational circuits.

关键词：单事件技术细胞瞬态镜像组合逻辑电路次布 CMOS工艺

来源：评论

学校读者我要写书评

暂无评论

Static Power Optimization for Homogeneous Multiple GPUs Based on Task Partition

Static Power Optimization for Homogeneous Multiple GPUs Base...

引用

2nd International Congress on Computer Applications and Computational Science (CACS 2011)

作者： Lin, Yisong Tang, Tao Wang, Guibin National Laboratory of Parallel and Distributed Processing National University of Defense Technology Changsha China

ISBN: (纸本)9783642283079;9783642283086

Recently, GPU has been widely used in High Performance Computing (HPC). In order to improve computational performance, several GPUs are integrated into one computer node in practical system. However, power consumption of GPUs is very high and becomes as bottleneck to its further development. In doing so, optimizing power consumption have been draw broad attention in the research area and industry community. In this paper, we present an energy optimization model considering performance constraint for homogeneous multi-GPUs, and propose a performance prediction model when task partitioning policy is specified. Experiment results validate that the model can accurately predict the execution of program for single or multiple GPUs, and thus reduce static power consumption by the guide of task partition.

关键词： Electric power utilization

来源：评论

学校读者我要写书评

暂无评论

Implementation of ternary Shor's algorithm based on vibrational states of an ion in anharmonic potential

引用

Chinese Physics B 2015年第3期24卷 157-165页

作者：刘威陈书明张见吴春旺吴伟陈平形 College of Computer National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory (PDL) National University of Defense Technology College of Science National University of Defense Technology

It is widely believed that Shor＇s factoring algorithm provides a driving force to boost the quantum computing ***, a serious obstacle to its binary implementation is the large number of quantum gates. Non-binary quantum computing is an efficient way to reduce the required number of elemental gates. Here, we propose optimization schemes for Shor＇s algorithm implementation and take a ternary version for factorizing 21 as an example. The optimized factorization is achieved by a two-qutrit quantum circuit, which consists of only two single qutrit gates and one ternary controlled-NOT gate. This two-qutrit quantum circuit is then encoded into the nine lower vibrational states of an ion trapped in a weakly anharmonic potential. Optimal control theory（OCT） is employed to derive the manipulation electric field for transferring the encoded states. The ternary Shor＇s algorithm can be implemented in one single step. Numerical simulation results show that the accuracy of the state transformations is about 0.9919.

关键词： ternary Shor's algorithm anharmonic ion trapping optimal control theory vibrational state

来源：评论

学校读者我要写书评

暂无评论

Optimizing guest swapping using elastic and transparent memory provisioning on virtualization platform

引用

Frontiers of Computer Science 2016年第5期10卷 908-924页

作者： Xi LI Pengfei ZHANG Rui CHU Huaimin WANG School of Information Science and Engineering Central South University Changsha 410083 China National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha 410008 China

On virtualization platforms, peak memory de- mand caused by hotspot applications often triggers page swapping in guest OS, causing performance degradation in- side and outside of this virtual machine （VM）. Even though host holds sufficient memory pages, guest OS is unable to utilize free pages in host directly due to the semantic gap between virtual machine monitor （MM） and guest operat- ing system （OS）. Our work aims at utilizing the free memory scattered in multiple hosts in a virtualization environment to improve the performance of guest swapping in a transparent and implicit way. Based on the insightful analysis of behav- ioral characteristics of guest swapping, we design and im- plement a distributed and scalable framework HybridSwap. It dynamically constructs virtual swap pools using various policies, and builds up a synthetic swapping mechanism in a peer-to-peer way, which can adaptively choose different vir- tual swap pools. We implement the prototype of HybridSwap and evaluate it with some benchmarks in different scenar- ios. The evaluation results demonstrate that our solution has the ability to promote the guest swapping efficiency indeed and shows a double performance promotion in some cases. Even in the worst case, the system overhead brought by Hy- bridSwap is acceptable.

关键词： virtualization memory management guestswapping performance degradation

来源：评论

学校读者我要写书评

暂无评论

Mobility of internet-based virtual computing environment

Mobility of internet-based virtual computing environment

引用

15th International Conference on parallel and distributed Systems, ICPADS '09

作者： Shen, Siqi Wang, Ji Shen, Rui Zhang, Shengdong Fan, Pei National Laboratory for Parallel and Distributed Processing Changsha 410073 China

ISBN: (纸本)9780769539003

The Internet-based Virtual Computing Environment (iVCE) provides on-demand aggregation and autonomic collaboration mechanisms to facilitate the utilization of autonomous and dynamic Internet resources. Load balancing and fault tolerance are important issues when scheduling those transient resources. In this paper, we propose a mobility mechanism for the migration of various roles of agents in the iVCE platform. The mobility mechanism involves two parts of the iVCE platform: role container layer and event service layer. At the role container layer, a novel approach is proposed to handle the code and data mobility issue. At the event service layer, an efficient routing reconfiguration protocol is proposed based on a publish/subscribe system over DHTs to facilitate task migrations. Certain conditions must be satisfied before the migration of an agent to ensure the correctness of the whole process. Experiments are conducted to evaluate the performance of the mobility mechanism, and the experimental results show that it is suitable for implementing load balancing and fault tolerance in the iVCE. © 2009 IEEE.

关键词： Fault tolerance

来源：评论

学校读者我要写书评

暂无评论

MilkyWay-2 supercomputer： system and application

引用

Frontiers of Computer Science 2014年第3期8卷 345-356页

作者： Xiangke LIAO Liquan XIAO Canqun YANG Yutong LU Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

On June 17, 2013, MilkyWay-2 （Tianhe-2） supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.

关键词： MilkyWay-2 supercomputer petaflops computing neo-heterogeneous architecture interconnect network heterogeneous programing model system management benchmark optimization performance evaluation

来源：评论

学校读者我要写书评

暂无评论

The TH Express high performance interconnect networks

引用

Frontiers of Computer Science 2014年第3期8卷 357-366页

作者： Zhengbin PANG Min XIE Jun ZHANG Yi ZHENG Guibin WANG Dezun DONG Guang SUO Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

Interconnection network plays an important role in scalable high performance computer （HPC） systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessot communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an upper message passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.

关键词： HPC network interface chip (NIC) TH Express nterconnect offload collective operation

来源：评论

学校读者我要写书评

暂无评论

QSobel:A novel quantum image edge extraction algorithm

引用

Science China(Information Sciences) 2015年第1期58卷 107-119页

作者： ZHANG Yi LU Kai GAO YingHui Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology College of Computer National University of Defense Technology College of Electronic Science and Engineering National University of Defense Technology

Edge extraction is an indispensable task in digital image processing. With the sharp increase in the image data, real-time problem has become a limitation of the state of the art of edge extraction *** this paper, QSobel, a novel quantum image edge extraction algorithm is designed based on the flexible representation of quantum image(FRQI) and the famous edge extraction algorithm Sobel. Because FRQI utilizes the superposition state of qubit sequence to store all the pixels of an image, QSobel can calculate the Sobel gradients of the image intensity of all the pixels simultaneously. It is the main reason that QSobel can extract edges quite fast. Through designing and analyzing the quantum circuit of QSobel, we demonstrate that QSobel can extract edges in the computational complexity of O(n2) for a FRQI quantum image with a size of2 n × 2n. Compared with all the classical edge extraction algorithms and the existing quantum edge extraction algorithms, QSobel can utilize quantum parallel computation to reach a significant and exponential ***, QSobel would resolve the real-time problem of image edge extraction.

关键词： edge extraction quantum image processing FRQI Sobel computational complexity

来源：评论

学校读者我要写书评

暂无评论

GPU acceleration of subgraph isomorphism search in large scale graph

引用

Journal of Central South University 2015年第6期22卷 2238-2249页

作者：杨博卢凯高颖慧王小平徐凯 Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology College of Computer National University of Defense Technology Department of Electronic Science and Engineering National University of Defense Technology

A novel framework for parallel subgraph isomorphism on GPUs is proposed, named GPUSI, which consists of GPU region exploration and GPU subgraph matching. The GPUSI iteratively enumerates subgraph instances and solves the subgraph isomorphism in a divide-and-conquer fashion. The framework completely relies on the graph traversal, and avoids the explicit join operation. Moreover, in order to improve its performance, a task-queue based method and the virtual-CSR graph structure are used to balance the workload among warps, and warp-centric programming model is used to balance the workload among threads in a warp. The prototype of GPUSI is implemented, and comprehensive experiments of various graph isomorphism operations are carried on diverse large graphs. The experiments clearly demonstrate that GPUSI has good scalability and can achieve speed-up of 1.4–2.6 compared to the state-of-the-art solutions.

关键词： parallel graph isomorphism GPU backtrack paradigm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：