检索结果-内蒙古大学图书馆

Fat-tree routing and node ordering providing contention free traffic for MPI global collectives

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 2012年第11期72卷 1423-1432页

作者： Zahavi, Eitan Mellanox Technol LTD Elect Engn IL-20692 Yokneam Israel

As the size of High Performance Computing clusters grows, so does the probability of interconnect hot spots that degrade the latency and effective bandwidth the network provides. This paper presents a solution to this scalability problem for real life constant bisectional-bandwidth fat-tree topologies. It is shown that maximal bandwidth and cut-through latency can be achieved for MPI global collective traffic. To form such a congestion-free configuration, MPI programs should utilize collective communication, MPI-node-order should be topology aware, and the packet routing should match the MPI communication patterns. First, we show that MPI collectives can be classified into unidirectional and bidirectional shifts. Using this property, we propose a scheme for congestion-free routing of the global collectives in fully and partially populated fat trees running a single job. The no-contention result is then obtained for multiple jobs running on the same fat-tree by applying some job size and placement restrictions. Simulation results of the proposed routing, MPI-node-order and communication patterns show no contention which provides a 40% throughput improvement over previously published results for all-to-all collectives. (C) 2012 Elsevier Inc. All rights reserved.

关键词： Network topologies routing algorithms and techniques Collective communication

来源：评论

学校读者我要写书评

暂无评论

THE IBM BLUE GENE/Q INTERCONNECTION FABRIC

引用

IEEE MICRO 2012年第1期32卷 32-43页

作者： Chen, Dong Eisley, Noel A. Heidelberger, Philip Senger, Robert M. Sugawara, Yutaka Kumar, Sameer Salapura, Valentina Satterfield, David L. Steinmacher-Burow, Burkhard Parker, Jeffrey J. IBM TJ Watson Res Ctr Blue Gene Supercomp Project Hardware Team Yorktown Hts NY 10598 USA IBM Syst & Technol Grp Rochester MN USA IBM TJ Watson Res Ctr Serv Innovat Lab Yorktown Hts NY 10598 USA

This article describes the ibm blue gene/q interconnection network and message unit. Blue gene/q is the third generation in the ibm blue gene line of massively parallel supercomputers and can be scaled to 20 petaflops and beyond. For better application scalability and performance, blue gene/q has new routing algorithms and techniques to parallelize the injection and reception of packets in the network interface.

关键词： Blue Gene Q BG Q Parallel Computer Architecture Interconnect Technologies Router Architecture routing algorithms and techniques Network Interface Architecture Message Unit Interconnection Network

来源：评论

学校读者我要写书评

暂无评论

SymSig: A Low Latency interconnection topology for HPC clusters

SymSig: A Low Latency interconnection topology for HPC clust...

引用

20th International Conference on High Performance Computing (HiPC)

作者： Brahme, Dhananjay Bhardwaj, Onkar Chaudhary, Vipin Tata Consultancy Serv Ctr Excellence High Performance Comp Pune 411057 Maharashtra India Rensselaer Polytech Inst Dept Elect Comp & Syst Engn Troy NY 12180 USA SUNY Buffalo Dept Comp Sci & Engn Buffalo NY 14260 USA

ISBN: (纸本)9781479907298

This paper presents the underlying theory and the performance of a cluster using a new 2-hop network topology. This topology is constructed using a symmetric equation and Singer Difference Sets and is called SymSig. The degree of connections at each node with SymSig is about half compared to previous methods using Singer Difference Sets. A comparison with a cluster of Clos topology shows significant advantages. The worst case congestion in SymSig topology for unicast permutation is 2, where as in Clos it is proportional to the radix of the building block switches used. The number of switches required is smaller by about 25%, the size of the cluster is larger by about 15% and the worst bandwidth is better by about 50% for SymSig. These advantages are retained for peta and exascale systems. Its performance on a set of collectives like exchange-all, shift-all, broadcast-all and all-to-all send/receive shows improvements ranging from 39% to 83%. Its performance on a molecular dynamics application GROMMACS shows improvement of upto 33%. This network is particularly suitable for applications that require global all to all communications. The low latency of this network makes it scaleable and an attractive alternative for building peta and exascale systems.

关键词： network topology bandwidth latency computer architecture parallel computing parallel computer architecture high performance computing exascale computing benchmark communication library functions high performance computing applications routing algorithms and techniques

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：