版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Natl Digital Switching Syst Engn & Technol Res Ct Zhengzhou 450001 Peoples R China State Key Lab Math Engn & Adv Comp Zhengzhou 450001 Peoples R China
出 版 物:《COMPUTER JOURNAL》 (计算机杂志)
年 卷 期:2016年第59卷第1期
页 面:119-132页
核心收录:
学科分类:08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:HEGAOJI Major Project of China [2009ZX01036-001-001-2] Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing [2013A11]
主 题:parallelizing compiler code generation MPI message aggregation
摘 要:Compiling for distributed-memory architectures comprise two main phases. The first phase is to determine computation and data composition. In the 1990s, a great deal of work addressed this problem. The second phase is code generation. However, there is still no effective solution to this problem. Existing methods try to generate codes on the basis of computation and data composition. To enhance the performance of generated codes, various communication optimizations are introduced since communication is one of the main factors degrading the performance. These approaches would bring redundant communication data, as they did not optimize communications jointly with code generation. In this paper, we propose a novel code generation technique for distributed-memory architectures. First, we determine the communication sender and receiver by traversing a loop-based tree structure. To support message aggregation, we find the most appropriate point to insert a message. Secondly, we construct the communication set by proposing some code generation rules, and prove their correctness and accuracy. Redundant communication is thus eliminated. Also, we have evaluated some programs ranging from micro-kernels to applications in NAS parallel benchmarks, and have compared the performance with their message passing interface (MPI), High Performance Fortran (HPF) and Unified Parallel C (UPC) versions. Compared with these versions, our compiler can generate fewer communication points. The generated codes of outperform the HPF and UPC versions and the state-of-the-art, and the average performance can reach 70% of the hand-coded MPI programs.