Large performance growth for processors requires exploitation of hardware parallelism, which, itself, requires parallelism in software. In spite of massive efforts, automatic parallelization of serial programs has had...
详细信息
ISBN:
(纸本)9781450341219
Large performance growth for processors requires exploitation of hardware parallelism, which, itself, requires parallelism in software. In spite of massive efforts, automatic parallelization of serial programs has had limited success mostly for regular programs with affine accesses, but not for many applications including irregular ones. It appears that the hare minimum that the programmer needs to spell out is which operations can he executed in parallel. However, parallel programming today requires so much more. The programmer is expected to partition a task into subtasks (often threads) so as to meet multiple constraints and Objectives, involving data and computation partitioning, locality synchronization, race conditions, limiting and hiding communication latencies. It is no wonder that this makes parallel programming hard, drastically reducing programmer's productivity and performance gains hence reducing adoption by programmers and their employers. Suppose, however, that the effort of the programmer is reduced to merely stating operations that can he executed in parallel, the 'work-depth' bare minimum abstraction developed for PRAM (the lead theory of parallel algorithms). What performance penalty should this incur? Perhaps surprisingly, the upshot of our work is that this can be done with no performance penalty relative to hand-optimized multi-threaded code.
暂无评论