检索结果-内蒙古大学图书馆

Comparison of heavy-ion induced SEU for D- and TMR-flip-flop designs in 65-nm bulk CMOS technology

science China(Information sciences) 2014年第10期57卷 223-229页

作者： HE YiBai CHEN ShuMing School of Computer Science National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology

Heavy ion experiments were performed on D flip-flop(DFF) and TMR flip-flop(TMRFF) fabricated in a 65-nm bulk CMOS process. The experiment results show that TMRFF has about 92% decrease in SEU crosssection compared to the standard DFF design in static test mode. In dynamic test mode, TMRFF shows much stronger frequency dependency than the DFF design, which reduces its advantage over DFF at higher operation frequency. At 160 MHz, the TMRFF is only 3.2× harder than the standard DFF. Such small improvement in the SEU performance of the TMR design may warrant reconsideration for its use in hardening design.

关键词： SEU flip-flop TMR heavy-ion frequency

来源：评论

学校读者我要写书评

暂无评论

OpenMP-Based Monte Carlo Dose Calculation for Radiotherapy Treatment Planning on the Intel MIC Architecture

OpenMP-Based Monte Carlo Dose Calculation for Radiotherapy T...

引用

IEEE International Symposium on Information (IT) in Medicine and Education, ITME

作者： Qinglin Wang Jie Liu Peizhen Xie Chunye Gong Yuan Li Zuocheng Xing School of Computer Science National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

Monte Carlo (MC) simulation plays an important part in dose calculation for radiotherapy treatment planning. Since the accuracy of MC simulation relies on the number of simulated particles histories, it's very time-consuming. The Intel Many Integrated Core (MIC) architecture, which consists of more than 50 cores and supports many parallel programming models, provides an efficient alternative for accelerating MC dose calculation. This paper implements the OpenMP-based MC Dose Planning Method (DPM) for radiotherapy treatment problems on the Intel MIC architecture. The implementation has been verified on the target MIC coprocessor including 57 cores. The results demonstrate that the OpenMP-based DPM implementation exhibits very accurate results and achieves the maximum speedup of 10.53 times in comparison to the original DPM one on a Xeon E5-2670 CPU. Additionally, speedup and efficiency of the implementation running on the different number of cores in MIC are also reported.

关键词： Microwave integrated circuits Instruction sets computer architecture Photonics Computational modeling History Interpolation

来源：评论

学校读者我要写书评

暂无评论

High-energy-density electron beam from interaction of two successive laser pulses with subcritical-density plasma

引用

Physical Review Accelerators and Beams 2016年第2期19卷 021301-021301页

作者： J. W. Wang W. Yu M. Y. Yu H. Xu J. J. Ju S. X. Luan M. Murakami M. Zepf S. Rykovanov Helmholtz Institute Jena Jena 07743 Germany State Key Laboratory of High Field Laser Physics Shanghai Institute of Optics and Fine Mechanics Chinese Academy of Sciences Shanghai 201800 China Institute for Fusion Theory and Simulation and the Department of Physics Zhejiang University Hangzhou 310027 China Institute for Theoretical Physics I Ruhr University Bochum D-44780 Germany National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha 410073 China Institute of Laser Engineering Osaka University Osaka 565-0871 Japan Centre for Plasma Physics School of Mathematics and Physics Queen’s University Belfast Belfast BT7 1NN United Kingdom

It is shown by particle-in-cell simulations that a narrow electron beam with high energy and charge density can be generated in a subcritical-density plasma by two consecutive laser pulses. Although the first laser pulse dissipates rapidly, the second pulse can propagate for a long distance in the thin wake channel created by the first pulse and can further accelerate the preaccelerated electrons therein. Given that the second pulse also self-focuses, the resulting electron beam has a narrow waist and high charge and energy densities. Such beams are useful for enhancing the target-back space-charge field in target normal sheath acceleration of ions and bremsstrahlung sources, among others.

关键词： Plasma acceleration & new acceleration techniques

来源：评论

学校读者我要写书评

暂无评论

Classification of Tiangong-1 hyperspectral remote sensing image via contextual sparse coding

Classification of Tiangong-1 hyperspectral remote sensing im...

引用

International Conference on Machine Learning and Cybernetics (ICMLC)

作者： Qi Lv Yong Dou Xin Niu Jiaqing Xu Jinbo Xu School of Computer National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

ISBN: (纸本)9781467372220

The hyperspectral remote sensing is one of the frontier techniques in the remote sensing research fields. Applying the sparse coding model to the hyperspectral remote sensing image processing is a hot topic in hyperspectral information processing. To improve the accuracy of hyperspectral image classification, we propose a classification method based on the spatial-spectral join-t contextual sparse coding. Firstly, a dictionary is obtained by training using samples selected from the ground-truth reference data. Then, the sparse coefficients of each pixel are calculated based on the learned dictionary. Afterward, the sparse coefficients are input to the classifier and the final classification result is obtained. The visible and near-infrared hyperspectral remote sensing image collected by Tiangong-1 in Chaoyang District of Beijing is used to evaluate the performance of the proposed approach. Experimental results show that the proposed method yields the best classification performance with the overall accuracy of 95.74% and the Kappa coefficient of 0.9476 in comparison with other classification methods.

关键词： Hyperspectral image Remote sensing Sparse coding Sparse coding Classification Methods remote sensing hyperspectral remote sensing hyperspectral imagery Image classification

来源：评论

学校读者我要写书评

暂无评论

Design and Implementation of a Highly Efficient DGEMM for 64-Bit ARMv8 Multi-core Processors

Design and Implementation of a Highly Efficient DGEMM for 64...

引用

International Conference on parallel processing (ICPP)

作者： Feng Wang Hao Jiang Ke Zuo Xing Su Jingling Xue Canqun Yang School of Computer Science National University of Defense Technology Changsha China School of Computer Science and Engineering University of New South Wales NSW Australia Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China

This paper presents the design and implementation of a highly efficient Double-precision General Matrix Multiplication (DGEMM) based on Open BLAS for 64-bit ARMv8 eight-core processors. We adopt a theory-guided approach by first developing a performance model for this architecture and then using it to guide our exploration. The key enabler for a highly efficient DGEMM is a highly-optimized inner kernel GEBP developed in assembly language. We have obtained GEBP by (1) maximizing its compute-to-memory access ratios across all levels of the memory hierarchy in the ARMv8 architecture with its performance-critical block sizes being determined analytically, and (2) optimizing its computations through exploiting loop unrolling, instruction scheduling and software-implemented register rotation and taking advantage of A64 instructions to support efficient FMA operations, data transfers and prefetching. We have compared our DGEMM implemented in Open BLAS with another implemented in ATLAS (also in terms of a highly-optimized GEBP in assembly). Our implementation outperforms the one in ALTAS by improving the peak performance (efficiency) of DGEMM from 3.88 Gflops (80.9%) to 4.19 Gflops (87.2%) on one core and from 30.4 Gflops (79.2%) to 32.7 Gflops (85.3%) on eight cores. These results translate into substantial performance (efficiency) improvements by 7.79% on one core and 7.70% on eight cores. In addition, the efficiency of our implementation on one core is very close to the theoretical upper bound 91.5% obtained from micro-benchmarking. Our parallel implementation achieves good performance and scalability under varying thread counts across a range of matrix sizes evaluated.

关键词： Registers Kernel Computational modeling Program processors Assembly Memory management

来源：评论

学校读者我要写书评

暂无评论

HybridSwap: A scalable and synthetic framework for guest swapping on virtualization platform

HybridSwap: A scalable and synthetic framework for guest swa...

引用

IEEE Annual Joint Conference: INFOCOM, IEEE computer and Communications Societies

作者： Pengfei Zhang Xi Li Rui Chu Huaimin Wang National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China School of Information Science and Engineering Central South University Changsha China

In IaaS cloud environments, peak memory demand caused by hotspot applications in Virtual Machine (VM) often results in performance degradation within and outside of this VM. Some solutions such as host swapping and ballooning for memory consolidation and overcommitment have been proposed. These solutions, however, have no help for addressing guest swapping issues inside VM. Even though host holds sufficient memory pages, guest OS is unable to utilize free pages in host directly due to the semantic gap between VMM and it. Our goal is to alleviate the performance degradation by decreasing disk I/O operations generated by guest swapping. Based on the insight analysis of behavioral features of guest swapping, we design HybridSwap, a distributed scalable framework which organize surplus memory in all hosts within data center into virtual pools for swapping. This framework builds up a synthetic swapping mechanism in a peer-to-peer way, which VM can adaptively choose suitable pools for swapping. We implement the prototype of HybridSwap and evaluate it with different benchmarks. The results demonstrate that our solution has the ability to promote the guest swapping efficiency indeed. Even in some cases, it shows 2-5 times of performance promotion compared with the baseline setup.

关键词： Benchmark testing Virtualization Semantics Servers Operating systems Degradation Instruction sets

来源：评论

学校读者我要写书评

暂无评论

Accelerating FDTD simulation of microwave pulse coupling into narrow slots on the Intel MIC architecture

Accelerating FDTD simulation of microwave pulse coupling int...

引用

IEEE Pacific Rim Conference on Communications, computers and Signal processing

作者： Qinglin Wang Jie Liu Xiantao Cui Guitao Fu Chunye Gong Zuocheng Xing School of Computer Science National University of Defense Technology Changsha China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Beijing Satellite Navigation Center Beijing China

ISBN: (纸本)9781467377898

The coupling of microwaves into apertures plays an important part in many electromagnetic physics and engineering fields. When the width of apertures is very small, Finite Difference Time Domain (FDTD) simulation of the coupling is very time-consuming. As a many-core architecture, the Intel's Many Integrated Core (MIC) architecture owns 512-bit vector units and more than 200 threads. In this paper, we parallelize FDTD simulation of microwave pulse coupling into narrow slots on the Intel MIC architecture. In the implementation, the parallel programming model OpenMP is used to exploit thread parallelism while loop unrolling and SIMD intrinsic functions are utilized to accomplish vectorization. Compared with the serial version on Intel Xeon E5-2670 CPU, the implementation on the MIC coprocessor including 57 cores obtains a speedup of 11.57 times. The experiment results also demonstrate that the parallelization has good scalability in performance. Additionally, how binding relationship between OpenMP threads and hardware threads in MIC influences performance is also reported.

关键词： Microwave integrated circuits Finite difference methods Time-domain analysis Chlorine Hardware

来源：评论

学校读者我要写书评

暂无评论

MilkyWay-2 supercomputer： system and application

引用

Frontiers of computer science 2014年第3期8卷 345-356页

作者： Xiangke LIAO Liquan XIAO Canqun YANG Yutong LU Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

On June 17, 2013, MilkyWay-2 （Tianhe-2） supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.

关键词： MilkyWay-2 supercomputer petaflops computing neo-heterogeneous architecture interconnect network heterogeneous programing model system management benchmark optimization performance evaluation

来源：评论

学校读者我要写书评

暂无评论

The TH Express high performance interconnect networks

引用

Frontiers of computer science 2014年第3期8卷 357-366页

作者： Zhengbin PANG Min XIE Jun ZHANG Yi ZHENG Guibin WANG Dezun DONG Guang SUO Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

Interconnection network plays an important role in scalable high performance computer （HPC） systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessot communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an upper message passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.

关键词： HPC network interface chip （NIC） TH Express nterconnect offload collective operation

来源：评论

学校读者我要写书评

暂无评论

DREAMS: Dynamic resource allocation for MapReduce with data skew

DREAMS: Dynamic resource allocation for MapReduce with data ...

引用

IFIP/IEEE International Symposium on Integrated Network Management

作者： Zhihong Liu Qi Zhang Mohamed Faten Zhani Raouf Boutaba Yaping Liu Zhenghu Gong College of Computer National University of Defense Technology Changsha China David R. Cheriton School of Computer Science University of Waterloo Waterloo ON Canada Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha Hunan China

ISBN: (纸本)9781479982424

MapReduce has become a popular model for large-scale data processing in recent years. However, existing MapRe-duce schedulers still suffer from an issue known as partitioning skew, where the output of map tasks is unevenly distributed among reduce tasks. In this paper, we present DREAMS, a framework that provides run-time partitioning skew mitigation. Unlike previous approaches that try to balance the workload of reducers by repartitioning the intermediate data assigned to each reduce task, in DREAMS we cope with partitioning skew by adjusting task run-time resource allocation. We show that our approach allows DREAMS to eliminate the overhead of data repartitioning. Through experiments using both real and synthetic workloads running on a 11-node virtual virtualised Hadoop cluster, we show that DREAMS can effectively mitigate negative impact of partitioning skew, thereby improving job performance by up to 20.3%.

关键词： Resource management Containers Predictive models Mathematical model Monitoring Biomedical monitoring Yarn

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：