The bulk synchronous parallel (BSP) model, as well as parallelprogramming interfaces based on BSP, classically target distributed-memoryparallel architectures. In earlier work, Yzelman and Bisseling designed a Multi...
详细信息
The bulk synchronous parallel (BSP) model, as well as parallelprogramming interfaces based on BSP, classically target distributed-memoryparallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the present article, we further investigate this concept and introduce the new high-performance MulticoreBSP for C library. Among other features, this library supports nested BSP runs. We show that existing BSP software performs well regardless whether it runs on distributed-memory or shared-memory architectures, and show that applications in MulticoreBSP can attain high-performance results. The paper details implementing the Fast Fourier Transform and the sparse matrix-vector multiplication in BSP, both of which outperform state-of-the-art implementations written in other shared-memory parallel programming interfaces. We furthermore study the applicability of BSP when working on highly non-uniform memory access architectures.
In this paper, we describe our experience of creating an OpenMP implementation of Bit-reversal for Fast Fourier Transform programs from the existing un-parallelizable sequential algorithm. The aim of this work is to p...
详细信息
ISBN:
(纸本)9783642104848
In this paper, we describe our experience of creating an OpenMP implementation of Bit-reversal for Fast Fourier Transform programs from the existing un-parallelizable sequential algorithm. The aim of this work is to present an analysis of a case study showing the development of a sharedmemoryparallel Bit-reversal for the FFT parallel code with practical and efficient use of multi-core machines. We present our implementation and discuss the results of the case study in terms of program improvement that may be needed to help parallel application developers with similar high performance goals. Our preliminary studies, results and experiments based on FFT code running on a four 4-cores Intel Xeon64 CPUs /Dell 6850 platform. The experimental results show that the performance of our new parallel code on 16 cores shared-memory machine are promising.
Bit-reversal is widely known being an important program, as essential part of Fast Fourier Transform. If not carefully and well designed, it may easily take large portion of FFT application's total execution time....
详细信息
ISBN:
(纸本)9783642130663
Bit-reversal is widely known being an important program, as essential part of Fast Fourier Transform. If not carefully and well designed, it may easily take large portion of FFT application's total execution time. In this paper, we present a parallel implementation of Bit-reversal for FFT using Cilk and UPC. Based on our previous work of creating parallel Bit-reversal using OpenMP in SPMD style from an unparallelized and sequential algorithm, we could note that keeping the existing parallelism by reorganizing the same program using Cilk and UPC libraries is possible yet achieving good performance. Experimental results were obtained by executing these parallel codes on two multi-core SMP platforms, and they show to be very promising.
暂无评论