The design of high-performance computing architectures requires performance analysis of large-scale parallel applications to derive various parameters concerning hardware design and software development. The process o...
详细信息
The design of high-performance computing architectures requires performance analysis of large-scale parallel applications to derive various parameters concerning hardware design and software development. The process of performance analysis and benchmarking an application can be done in several ways with varying degrees of fidelity. One of the most cost-effective ways is to do a coarse-grained study of large-scale parallel applications through the use of program skeletons. The concept of a "program skeleton" that we discuss in this article is an abstracted program that is derived from a larger program where sourcecode that is determined to be irrelevant is removed for the purposes of the skeleton. In this work, we develop a semiautomatic approach for extracting program skeletons based on compiler program analysis. We demonstrate correctness of our skeleton extraction process by comparing details from communication traces, as well as show the performance speedup of using skeletons by running simulations in the SST/macro simulator.
Multi-color ordering is a parallel ordering that allows programs to be parallelized by application to sequentially executed parts of the programs. While multi-color ordering parallelizes sequentially executed parts wi...
详细信息
ISBN:
(数字)9783642387180
ISBN:
(纸本)9783642387180;9783642387173
Multi-color ordering is a parallel ordering that allows programs to be parallelized by application to sequentially executed parts of the programs. While multi-color ordering parallelizes sequentially executed parts with data dependences and increases the number of parts executed in parallel, improved performance by multi-color ordering is sensitive to differences in the architectures and systems on which the programs are executed. This sensitivity requires us to tune the numbers of colors;i.e., modify programs for each architecture and system. In this work, we develop a code generator based on multi-color ordering and automatically tune the number of colors using a job-level parallel scripting language Xcrypt. Furthermore, we support block multi-color ordering that avoids the disadvantage of stride accesses in the original multi-color ordering, and evaluate and clarify the effectiveness of block multi-color ordering.
暂无评论