咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Planning for performance: Enha... 收藏

Planning for performance: Enhancing achievable performance for MPI through persistent collective operations

为性能计划: 通过坚持的集体操作为 MPI 提高可完成的性能

作     者:Holmes, Daniel J. Morgan, Bradley Skjellum, Anthony Bangalore, Purushotham V. Sridharan, Srinivas 

作者机构:Univ Edinburgh EPCC Edinburgh EH9 3FD Midlothian Scotland Auburn Univ OIT Auburn AL 36849 USA Univ Tennessee SimCtr Chattanooga TN 37403 USA Univ Tennessee Dept Comp Sci & Engn Chattanooga TN 37403 USA Univ Alabama Birmingham Dept Comp Sci Birmingham AL 35294 USA Intel Corp 23-56POuter Ring Rd Bangalore 560017 Karnataka India 

出 版 物:《PARALLEL COMPUTING》 (并行计算)

年 卷 期:2019年第81卷第Jan.期

页      面:32-57页

核心收录:

学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:National Science Foundation [CCF-1562659, CCF-1562306, CCF-1617690, CCF-1822191, CCF-1821431, OAC-1541310, CNS-1229282] European Union's Horizon 2020 Framework Programme Research and Innovation programme Auburn University Hopper Cluster, University of Alabama at Birmingham Cheaha Cluster 

主  题:MPI Collective communication Persistence Nonblocking Optimized algorithm 

摘      要:Advantages of nonblocking collective communication in MPI have been established over the past quarter century, even predating MPI-1. For regular computations with fixed communication patterns, significant additional optimizations can be revealed through the use of persistence (planned transfers) not currently available in the MPI-3 API except for a limited form of point-to-point persistence (aka half-channels) standardized since MPI-1. This paper covers the design, prototype implementation of LibPNBC (based on LibNBC), and MPI-4 standardization status of persistent nonblocking collective operations. We provide early performance results, using a modified version of NBCBench and an example application (based on 3D conjugate gradient) illustrating the potential performance enhancements for such operations. Persistent operations enable MPI implementations to make intelligent choices about algorithm and resource utilization once and amortize this decision cost across many uses in a long-running program. Evidence that this approach is of value is provided. As with non-persistent, nonblocking collective operations, the requirement for strong progress and blocking completion notification are jointly needed to maximize the benefit of such operations (e.g., to support overlap of communication with computation and/or other communication). Further enhancement of the current reference implementation, as well as additional opportunities to enhance performance through the application of these new APIs, comprise future work. (C) 2018 Published by Elsevier B.V.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分