A wide range of applications in engineering and scientific computing are based on the computation of matrices products, where one of them is sparse. The computational requirements of these operations are very high whe...
详细信息
A wide range of applications in engineering and scientific computing are based on the computation of matrices products, where one of them is sparse. The computational requirements of these operations are very high when dimensions of the matrices increase. The goal of this work is the acceleration of the sparse matrix matrix product (SpMM) on Graphics Processing units (GPUs). The operation SpMM can be computed by a set of sparse matrix vector operations (SpMV). However, this approach does not reach optimal performance because it cannot benefit from the large value of the ratio computation/memory access associated to the SpMM operation. In this work a routine called FastSpMM is described and its performance evaluated. FastSpMM can be considered as an extension of the ELLRT routine to compute SpMV on GPUs which is based on the ELLPACK-R storage format for sparse matrices. FastSpMM combines the high ratio computation/memory access with the advantages of ELLR-T to exploit the GPU architecture. The CUSPARSE library, supplied by NVIDIA, which also includes routines to compute SpMM on GPUs is used in this work as a reference for performance comparison. Experimental evaluations based on a representative set of test matrices show that FastSpMM outperforms the corresponding CUSPARSE routine in terms of performance.
This work presents a hybrid computing approach which combines GPUs and multicore processors to fully take advantage of the computing power latent in modern computers. It also presents its application to the problem of...
详细信息
This work presents a hybrid computing approach which combines GPUs and multicore processors to fully take advantage of the computing power latent in modern computers. It also presents its application to the problem of tomographic reconstruction. One inherent characteristic of these modern platforms is their heterogeneity, which raises the issue of workload distribution among the different processing elements. Adaptive load balancing techniques are thus necessary to properly adjust the amount of work to be done by each computing element. Here, we have chosen the 'on-demand' strategy, a well-known technique in the HPC field by which the different elements asynchronously request a piece of work when they become idle, thereby keeping the system fairly well balanced. The results show that our scheme accommodates to the heterogeneous platform where it runs as it assigns more work to the faster processing elements automatically, which allows to correctly exploit all the resources available and to get complete reconstructions in less time than pure CPU or GPU approaches.
暂无评论