In the many-core era, the performance of MPI collectives is more dependent on the intra-node communication component. However, the communication algorithms generally inherit from the inter-node version and ignore the ...
详细信息
ISBN:
(纸本)9781450344937
In the many-core era, the performance of MPI collectives is more dependent on the intra-node communication component. However, the communication algorithms generally inherit from the inter-node version and ignore the cache complexity. We propose cache-oblivious algorithms for MPI all-to-all operations, in which data blocks are copied into the receive buffers in Morton order to exploit data locality. Experimental results on different many-core architectures show that our cache-oblivious implementations significantly outperform the naive implementations based on shared heap and the highly optimized MPI libraries.
This paper considers the problem of cache-obliviously scheduling streaming pipelines on uniprocessors with the goal of minimizing cache misses. Our recursive algorithm is not parameterized by cache size, yet it achiev...
详细信息
ISBN:
(纸本)9781450328210
This paper considers the problem of cache-obliviously scheduling streaming pipelines on uniprocessors with the goal of minimizing cache misses. Our recursive algorithm is not parameterized by cache size, yet it achieves the asymptotically minimum number of cache misses with constant factor memory augmentation.
暂无评论