检索结果-内蒙古大学图书馆

30th IEEE International Parallel and distributed Processing Symposium (IPDPS)

作者： Azad, Ariful Buluc, Aydin Lawrence Berkeley Natl Lab Computat Res Div Berkeley CA 94720 USA

ISBN: (纸本)9781509021406

We design and implement scalable distributed-memory algorithms for maximum cardinality matching in bipartite graphs. Computing matchings on distributed-memory supercomputers is challenged by the irregular and asynchronous data access patterns in graph searches and the difficulty in processing long paths passing through multiple processors. We address these challenges by developing an algorithm based on matrix algebra. We employ bulk-synchronous matrix algebraic modules to implement graph searches, and Remote memory Access (RMA) operations to map asynchronous light-weight graph accesses. On real matrices, our algorithm achieves up to 18x speedup when we go from 24 cores to 2048 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 12,000 cores.

关键词： Cardinality matching distributed-memory algorithm Parallel algorithm

来源：评论

学校读者我要写书评

暂无评论

Engineering a distributed-memory Triangle Counting algorithm 37

Engineering a Distributed-Memory Triangle Counting Algorithm

引用

37th IEEE International Parallel and distributed Processing Symposium (IPDPS)

作者： Sanders, Peter Uhl, Tim Niklas Karlsruhe Inst Technol Inst Theoret Informat Karlsruhe Germany

ISBN: (纸本)9798350337662

Counting triangles in a graph and incident to each vertex is a fundamental and frequently considered task of graph analysis. We consider how to efficiently do this for huge graphs using massively parallel distributed-memory machines. Unsurprisingly, the main issue is to reduce communication between processors. We achieve this by counting locally whenever possible and reducing the amount of information that needs to be sent in order to handle (possible) nonlocal triangles. We also achieve linear memory requirements despite superlinear communication volume by introducing a new asynchronous sparseall-to-all operation. Furthermore, we dramatically reduce startup overheads by allowing this communication to use indirect routing. Our algorithms scale (at least) up to 32 768 cores and are up to 18 times faster than the previous state of the art.

关键词： triangle counting graph analysis clustering coefficient distributed-memory algorithm MPI

来源：评论

学校读者我要写书评

暂无评论

CLAIRE: A distributed-memory SOLVER FOR CONSTRAINED LARGE DEFORMATION DIFFEOMORPHIC IMAGE REGISTRATION

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2019年第5期41卷 C548-C584页

作者： Mang, Andreas Gholami, Amir Davatzikos, Christos Biros, George Univ Houston Dept Math Houston TX 77204 USA Univ Calif Berkeley Dept Elect Engn & Comp Sci Berkeley CA 94720 USA Univ Penn Ctr Biomed Image Comp & Analyt Dept Radiol Philadelphia PA 19104 USA Univ Texas Austin Oden Inst Computat Engn & Sci Austin TX 78712 USA

With this work we release CLAIRE, a distributed-memory implementation of an effective solver for constrained large deformation diffeomorphic image registration problems in three dimensions. We consider an optimal control formulation. We invert for a stationary velocity field that parameterizes the deformation map. Our solver is based on a globalized, preconditioned, inexact reduced space Gauss-Newton-Krylov scheme. We exploit state-of-the-art techniques in scientific computing to develop an effective solver that scales to thousands of distributed memory nodes on high-end clusters. We present the formulation, discuss algorithmic features, describe the software package, and introduce an improved preconditioner for the reduced space Hessian to speed up the convergence of our solver. We test registration performance on synthetic and real data. We demonstrate registration accuracy on several neuroimaging datasets. We compare the performance of our scheme against different flavors of the Demons algorithm for diffeomorphic image registration. We study convergence of our preconditioner and our overall algorithm. We report scalability results on state-of-the-art supercomputing platforms. We demonstrate that we can solve registration problems for clinically relevant data sizes in two to four minutes on a standard compute node with 20 cores, attaining excellent data fidelity. With the present work we achieve a speedup of (on average) 5x with a peak performance of up to 17x compared to our former work.

关键词： diffeomorphic image registration LDDMM Newton-Krylov method KKT preconditioner optimal control distributed-memory algorithm PDE-constrained optimization

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: Scalable distributed String Sorting 24

Brief Announcement: Scalable Distributed String Sorting

引用

36th ACM Symposium on Parallelism in algorithms and Architectures (SPAA)

作者： Kurpicz, Florian Mehnert, Pascal Sanders, Peter Schimek, Matthias Karlsruhe Inst Technol Karlsruhe Germany

ISBN: (纸本)9798400704161

String sorting is an important part of tasks such as building index data structures. Unfortunately, current string sorting algorithms do not scale to massively parallel distributed-memory machines since they either have latency (at least) proportional to the number of processors.. or communicate the data a large number of times (at least logarithmic). We present practical and efficient algorithms for distributed-memory string sorting that scale to large p. Similar to state-of-the-art sorters for atomic objects, the algorithms have latency of about p(1/k) when allowing the data to be communicated k times. Experiments show good scaling behavior on a wide range of inputs on up to 49 152 cores. We achieve speedups of up to 5 over the current state-of-the-art distributed string sorting algorithms.

关键词： distributed-memory algorithm string sorting multi-level algorithm communication-efficient algorithm

来源：评论

学校读者我要写书评

暂无评论

Communication-Efficient String Sorting 34

Communication-Efficient String Sorting

引用

34th IEEE International Parallel and distributed Processing Symposium (IPDPS)

作者： Bingmann, Timo Sanders, Peter Schimek, Matthias Karlsruhe Inst Technol Karlsruhe Germany

ISBN: (纸本)9781728168760

There has been surprisingly little work on algorithms for sorting strings on distributed-memory parallel machines. We develop efficient algorithms for this problem based on the multi-way merging principle. These algorithms inspect only characters that are needed to determine the sorting order. Moreover, communication volume is reduced by also communicating (roughly) only those characters and by communicating repetitions of the same prefixes only once. Experiments on up to 1280 cores reveal that these algorithm are often more than five times faster than previous algorithms.

关键词： distributed-memory algorithm string sorting communication-efficient algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：