In this paper, we develop an automatic compile-time computation and data decomposition technique for distributed-memorymachines. Our method handles complex programs containing perfect and non-perfect loop nests with ...
详细信息
In this paper, we develop an automatic compile-time computation and data decomposition technique for distributed-memorymachines. Our method handles complex programs containing perfect and non-perfect loop nests with or without loop-carried dependences. Applying our algorithms, a program will be divided into collections (called clusters) of loop nests, such that data redistributions are allowed only between the clusters. Within each cluster of loop nests, decomposition and data locality constraints are formulated as a system of homogeneous linear equations which is solved by polynomial time algorithms. Our algorithm can selectively relax data locality constraints within a cluster to achieve a balance between parallelism and data locality. Such relaxations are guided by exploiting the hierarchical program nesting structures from outer to inner nesting levels to keep the communications at a outer-most level possible.
暂无评论