Many parallel architectures support a memory model where some memory accesses are local, and thus inexpensive, while other memory accesses are remote, and potentially quite expensive. In order to achieve good parallel...
详细信息
ISBN:
(纸本)9780818680908
Many parallel architectures support a memory model where some memory accesses are local, and thus inexpensive, while other memory accesses are remote, and potentially quite expensive. In order to achieve good parallel performance, it is often necessary to reduce the number of remote memory accesses. This can be done by the programmer, the compiler, or a combination of both. The overall goal is to minimize the work required by the programmer, and have the compiler automate the process as much as possible. The paper reports on compiler techniques for decreasing the number of remote memory accesses using locality analysis for a parallel dialect of C called EARTH-C. The locality analysis uses an algorithm inspired by type inference algorithms for fast points-to analysis. The algorithm estimates when an indirect reference via a pointer can be safely assumed to be a local access. The locality inference algorithm is also used to guide the automatic specialization of functions in order to take advantage of locality scientific to particular calling contexts. The locality analysis and automatic specialization has been implemented in the EARTH-C compiler which produces low level threaded code for the EARTH-C multithreaded architecture. Experimental results are presented for a set of benchmarks that operate on irregular, dynamically allocated data structures. The techniques give moderate to significant speedups and they do lessen the burden on the programmer.
Traditional compiler optimizations such as loop invariant removal and common sub-expression elimination are standard in all optimizing compilers. The purpose of the paper is to present new versions of these optimizati...
详细信息
ISBN:
(纸本)9780818680908
Traditional compiler optimizations such as loop invariant removal and common sub-expression elimination are standard in all optimizing compilers. The purpose of the paper is to present new versions of these optimizations that apply to programs using dynamically allocated data structures, and to show the effect of these optimizations on the performance of multithreaded programs. We show how heap pointer analyses can be used to support better dependence testing, new applications of the above traditional optimizations, and high quality code generation for multithreaded architectures. We have implemented these analyses and optimizations in the EARTH-C compiler to study their impact on the performance of generated multithreaded code. We provide both static and dynamic measurements showing the effect of the optimizations applied individually, and together. We note several general trends, and discuss the performance tradeoffs and suggest when specific optimizations are generally beneficial.
暂无评论