检索结果-内蒙古大学图书馆

MilkyWay-2 supercomputer： system and application

Frontiers of computer science 2014年第3期8卷 345-356页

作者： Xiangke LIAO Liquan XIAO Canqun YANG Yutong LU Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

On June 17, 2013, MilkyWay-2 （Tianhe-2） supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity- off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16- core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.

关键词： MilkyWay-2 supercomputer petaflops computing neo-heterogeneous architecture interconnect network heterogeneous programing model system management benchmark optimization performance evaluation

来源：评论

学校读者我要写书评

暂无评论

The TH Express high performance interconnect networks

引用

Frontiers of computer science 2014年第3期8卷 357-366页

作者： Zhengbin PANG Min XIE Jun ZHANG Yi ZHENG Guibin WANG Dezun DONG Guang SUO Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

Interconnection network plays an important role in scalable high performance computer （HPC） systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessot communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an upper message passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.

关键词： HPC network interface chip （NIC） TH Express nterconnect offload collective operation

来源：评论

学校读者我要写书评

暂无评论

Memory Access Analysis of Many-core System with Abundant Bandwidth

Memory Access Analysis of Many-core System with Abundant Ban...

引用

IEEE International Symposium on Embedded Multicore Socs (MCSoC)

作者： Chuan Tang Dan Liu Zuocheng Xing Peng Yang Zhe Wang Jiang Xu Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Department of Electronic and Computer Engineering Hong Kong University of Science and Technology Hong Kong China

ISBN: (纸本)9781479986712

Many-core system is main architecture trend currently. One of the dominating challenges for on-chip manycore system is the memory wall. However traditional research primarily focus on the limited bandwidth. To solve this problem, many-core system is aided with large cache, and a lot of complex approaches about memory and cache are adopted aiming at relaxing the pressure of bandwidth and improving the efficiency of cache. All these methods generate much cost of area and power. In this paper, we are motivated by the feature of abundant bandwidth and low latency of optical interconnect. We analyze the characteristics of memory access on 64 cores system under the case of high bandwidth which can be assumed to benefit from optical interconnect, considering the sensibility with bandwidth and cache for different benchmarks. Finally, we discuss about promising basic frameworks suitable for manycore system with optical interconnect.

关键词： Bandwidth Benchmark testing Optical interconnections Random access memory Delays Hidden Markov models computers

来源：评论

学校读者我要写书评

暂无评论

High-energy-density electron beam from interaction of two successive laser pulses with subcritical-density plasma

引用

Physical Review Accelerators and Beams 2016年第2期19卷 021301-021301页

作者： J. W. Wang W. Yu M. Y. Yu H. Xu J. J. Ju S. X. Luan M. Murakami M. Zepf S. Rykovanov Helmholtz Institute Jena Jena 07743 Germany State Key Laboratory of High Field Laser Physics Shanghai Institute of Optics and Fine Mechanics Chinese Academy of Sciences Shanghai 201800 China Institute for Fusion Theory and Simulation and the Department of Physics Zhejiang University Hangzhou 310027 China Institute for Theoretical Physics I Ruhr University Bochum D-44780 Germany National Laboratory for Parallel and Distributed Processing School of Computer Science National University of Defense Technology Changsha 410073 China Institute of Laser Engineering Osaka University Osaka 565-0871 Japan Centre for Plasma Physics School of Mathematics and Physics Queen’s University Belfast Belfast BT7 1NN United Kingdom

It is shown by particle-in-cell simulations that a narrow electron beam with high energy and charge density can be generated in a subcritical-density plasma by two consecutive laser pulses. Although the first laser pulse dissipates rapidly, the second pulse can propagate for a long distance in the thin wake channel created by the first pulse and can further accelerate the preaccelerated electrons therein. Given that the second pulse also self-focuses, the resulting electron beam has a narrow waist and high charge and energy densities. Such beams are useful for enhancing the target-back space-charge field in target normal sheath acceleration of ions and bremsstrahlung sources, among others.

关键词： Plasma acceleration & new acceleration techniques

来源：评论

学校读者我要写书评

暂无评论

DREAMS: Dynamic resource allocation for MapReduce with data skew

DREAMS: Dynamic resource allocation for MapReduce with data ...

引用

IFIP/IEEE International Symposium on Integrated Network Management

作者： Zhihong Liu Qi Zhang Mohamed Faten Zhani Raouf Boutaba Yaping Liu Zhenghu Gong College of Computer National University of Defense Technology Changsha China David R. Cheriton School of Computer Science University of Waterloo Waterloo ON Canada Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha Hunan China

ISBN: (纸本)9781479982424

MapReduce has become a popular model for large-scale data processing in recent years. However, existing MapRe-duce schedulers still suffer from an issue known as partitioning skew, where the output of map tasks is unevenly distributed among reduce tasks. In this paper, we present DREAMS, a framework that provides run-time partitioning skew mitigation. Unlike previous approaches that try to balance the workload of reducers by repartitioning the intermediate data assigned to each reduce task, in DREAMS we cope with partitioning skew by adjusting task run-time resource allocation. We show that our approach allows DREAMS to eliminate the overhead of data repartitioning. Through experiments using both real and synthetic workloads running on a 11-node virtual virtualised Hadoop cluster, we show that DREAMS can effectively mitigate negative impact of partitioning skew, thereby improving job performance by up to 20.3%.

关键词： Resource management Containers Predictive models Mathematical model Monitoring Biomedical monitoring Yarn

来源：评论

学校读者我要写书评

暂无评论

MilkyWay-2： back to the world Top 1

引用

Frontiers of computer science 2014年第3期8卷 343-344页

作者： Xiangke LIAO Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China College of Computer National University of Defense Technology Changsha 410073 China

On the 41st Top500 list announced in June 2013, the MilkyWay-2 system produced by National University of Defense technology （NUDT） in China won the first place with a LINPACK test result of 33.86 PFLOPS. It has been one and a half year since its predecessor, MilkyWay-1 （TH-1）, reached the same place for the first time. On the newest Top500 list published in November 2013, MilkyWay-2 continued to win the champion.

关键词：

来源：评论

学校读者我要写书评

暂无评论

HybridSwap: A scalable and synthetic framework for guest swapping on virtualization platform

HybridSwap: A scalable and synthetic framework for guest swa...

引用

IEEE Annual Joint Conference: INFOCOM, IEEE computer and Communications Societies

作者： Pengfei Zhang Xi Li Rui Chu Huaimin Wang National Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China School of Information Science and Engineering Central South University Changsha China

In IaaS cloud environments, peak memory demand caused by hotspot applications in Virtual Machine (VM) often results in performance degradation within and outside of this VM. Some solutions such as host swapping and ballooning for memory consolidation and overcommitment have been proposed. These solutions, however, have no help for addressing guest swapping issues inside VM. Even though host holds sufficient memory pages, guest OS is unable to utilize free pages in host directly due to the semantic gap between VMM and it. Our goal is to alleviate the performance degradation by decreasing disk I/O operations generated by guest swapping. Based on the insight analysis of behavioral features of guest swapping, we design HybridSwap, a distributed scalable framework which organize surplus memory in all hosts within data center into virtual pools for swapping. This framework builds up a synthetic swapping mechanism in a peer-to-peer way, which VM can adaptively choose suitable pools for swapping. We implement the prototype of HybridSwap and evaluate it with different benchmarks. The results demonstrate that our solution has the ability to promote the guest swapping efficiency indeed. Even in some cases, it shows 2-5 times of performance promotion compared with the baseline setup.

关键词： Benchmark testing Virtualization Semantics Servers Operating systems Degradation Instruction sets

来源：评论

学校读者我要写书评

暂无评论

Constraint-Relaxation Approach for Nonnegative Matrix Factorization: A Case Study

Constraint-Relaxation Approach for Nonnegative Matrix Factor...

引用

IEEE International Conference on Systems, Man and Cybernetics

作者： Jue Wang Naiyang Guan Xuhui Huang Zhigang Luo Science and Technology on Parallel and Distributed Processing Laboratory College of Computer National University of Defense Technology Hunan P.R. China Department of Computer Science and Techonoly College of Computer National University of Defense Technology Hunan P.R. China

ISBN: (纸本)9781479986989

Nonnegative matrix factorization (NMF) is a powerful technique for dimensionality reduction. Conventional NMF algorithms usually keep the matrices W and H nonnegative while iterating. However, to get the NMF of a matrix, it's unnecessary to force the temporary solutions in iterations nonnegative. In this paper, we propose a two-staged approach for NMF. At the relaxation stage, the nonnegative constraint of temporary solutions is relaxed and a real valued matrix factorization is generated. At the constraint stage, the real valued matrix factorization is transformed to a nonnegative matrix factorization by an invertible linear transformation. Based on this approach, we study on exact nonnegative matrix factorization when rank=2. We proved that, given two real valued matrices of rank=2, there exists an invertible linear transformation which can transform the real valued matrices to nonnegative matrices with their product stable. We propose an algorithm to find out the transformation. When rank is higher than 2, this kind of transformation may not exist. In the experiments, it's showed that this approach can reach a nonnegative matrix factorization with lower reconstruction error than conventional methods, and the technique for rank=2 exact NMF works well.

关键词： Algorithm design and analysis Matrix decomposition Additives computers Transforms Matrix converters Clustering algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast image matching algorithm based on affine invariants

引用

Journal of Central South University 2014年第5期21卷 1907-1918页

作者：张毅卢凯高颖慧 National Laboratory for Parallel and Distributed Processing(National University of Defense Technology) College of Computer National University of Defense Technology College of Electronic Science and Engineering National University of Defense Technology

Feature-based image matching algorithms play an indispensable role in automatic target recognition （ATR）. In this work, a fast image matching algorithm （FIMA） is proposed which utilizes the geometry feature of extended centroid （EC） to build affine invariants. Based on at-fine invariants of the length ratio of two parallel line segments, FIMA overcomes the invalidation problem of the state-of-the-art algorithms based on affine geometry features, and increases the feature diversity of different targets, thus reducing misjudgment rate during recognizing targets. However, it is found that FIMA suffers from the parallelogram contour problem and the coincidence invalidation. An advanced FIMA is designed to cope with these problems. Experiments prove that the proposed algorithms have better robustness for Gaussian noise, gray-scale change, contrast change, illumination and small three-dimensional rotation. Compared with the latest fast image matching algorithms based on geometry features, FIMA reaches the speedup of approximate 1.75 times. Thus, FIMA would be more suitable for actual ATR applications.

关键词： affine invariants image matching extended centroid robustness performance

来源：评论

学校读者我要写书评

暂无评论

Improving vertex-frontier based GPU breadth-first search

引用

Journal of Central South University 2014年第10期21卷 3828-3836页

作者：杨博卢凯高颖慧徐凯王小平程志权 Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology College of Computer National University of Defense Technology Department of Electronic Science and Engineering National University of Defense Technology Avatar Science Company

Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×109 edges/s.

关键词： breadth-first search GPU graph traversal vertex frontier

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：