As a typical Gauss-Seidel method, the inherent strong data dependency of lower-upper symmetric Gauss-Seidel (LU-SGS) poses tough challenges for shared-memory parallelization. On early multi-coreprocessors, the pipeli...
详细信息
As a typical Gauss-Seidel method, the inherent strong data dependency of lower-upper symmetric Gauss-Seidel (LU-SGS) poses tough challenges for shared-memory parallelization. On early multi-coreprocessors, the pipelined parallel LU-SGS approach achieves promising scalability. However, on emerging many-coreprocessors such as Xeon Phi, experience from our in-house high-order CFD program show that the parallel efficiency drops dramatically to less than 25%. In this paper, we model and analyze the performance of the pipelined parallel LU-SGS algorithm, present a two-level pipeline (TL-Pipeline) approach using nested OpenMP to further exploit fine-grained parallelisms and mitigate the parallel performance bottlenecks. Our TL-Pipeline approach achieves 20% performance gains for a regular problem (256 x 256 x 256) on Xeon Phi. We also discuss some practical problems including domain decomposition and algorithm parameters tuning for realistic CFD simulations. Generally, our work is applicable to the shared-memory parallelization of all Gauss-Seidel like methods with intrinsic strong data dependency.
暂无评论