sparse computations, such as sparse matrix-dense vector multiplication, are notoriously hard to optimize due to their irregularity and memory-boundedness. Solutions to improve the performance of sparse computations ha...
详细信息
ISBN:
(纸本)9781450398688
sparse computations, such as sparse matrix-dense vector multiplication, are notoriously hard to optimize due to their irregularity and memory-boundedness. Solutions to improve the performance of sparse computations have been proposed, ranging from hardware-based such as gather-scatter instructions, to software ones such as generalized and dedicated sparse formats, used together with specialized executor programs for different hardware targets. These sparse computations are often performed on read-only sparsestructures: while the data themselves are variable, the sparsity structure itself does not change. Indeed, sparse formats such as CSR have a typically high cost to insert/remove nonzero elements in the representation. The typical use case is to not modify the sparsity during possibly repeated computations on the same sparsestructure. In this work, we exploit the possibility to generate a specialized executor program dedicated to the particular sparsity structure of an input matrix. It creates opportunities to remove indirection arrays and synthesize regular, vectorizable code for such computations. But, at the same time, it introduces challenges in code size and instruction generation, as well as efficient SIMD vectorization. We present novel techniques and extensive experimental results to efficiently generate SIMD vector code for data-specific sparse computations, and study the limits in terms of applicability and performance of our techniques compared to state-of-practice highperformance libraries like Intel MKL.
sparse computations, such as sparse matrix-dense vector multiplication, are notoriously hard to optimize due to their irregularity and memory-boundedness. Solutions to improve the performance of sparse computations ha...
详细信息
ISBN:
(纸本)9781450398688
sparse computations, such as sparse matrix-dense vector multiplication, are notoriously hard to optimize due to their irregularity and memory-boundedness. Solutions to improve the performance of sparse computations have been proposed, ranging from hardware-based such as gather-scatter instructions, to software ones such as generalized and dedicated sparse formats, used together with specialized executor programs for different hardware targets. These sparse computations are often performed on read-only sparsestructures: while the data themselves are variable, the sparsity structure itself does not change. Indeed, sparse formats such as CSR have a typically high cost to insert/remove nonzero elements in the representation. The typical use case is to not modify the sparsity during possibly repeated computations on the same sparse *** this work, we exploit the possibility to generate a specialized executor program dedicated to the particular sparsity structure of an input matrix. It creates opportunities to remove indirection arrays and synthesize regular, vectorizable code for such computations. But, at the same time, it introduces challenges in code size and instruction generation, as well as efficient SIMD vectorization. We present novel techniques and extensive experimental results to efficiently generate SIMD vector code for data-specific sparse computations, and study the limits in terms of applicability and performance of our techniques compared to state-of-practice high-performance libraries like Intel MKL.
Irregular datastructures, as exemplified with sparse matrices, have proved to be essential in modern computing. Numerous sparse formats have been investigated to improve the overall performance of sparse Matrix-Vecto...
详细信息
ISBN:
(纸本)9781450367127
Irregular datastructures, as exemplified with sparse matrices, have proved to be essential in modern computing. Numerous sparse formats have been investigated to improve the overall performance of sparse Matrix-Vector multiply (SpMV). But in this work we propose instead to take a fundamentally different approach: to automatically build sets of regular sub-computations by mining for regular sub-regions in the irregular datastructure. Our approach leads to code that is specialized to the sparsity structure of the input matrix, but which does not need anymore any indirection array, thereby improving SIMD vectorizability. We particularly focus on small sparsestructures (below 10M nonzeros), and demonstrate substantial performance improvements and compaction capabilities compared to a classical CSR implementation and Intel MKL IE's SpMV implementation, evaluating on 200+ different matrices from the Suitesparse repository.
We present a newly developed version of our solvers for the verified solution of dense parametric linear systems, i.e. linear systems whose system matrix and right-hand side depend affine-linearly on parameters that v...
详细信息
We present a newly developed version of our solvers for the verified solution of dense parametric linear systems, i.e. linear systems whose system matrix and right-hand side depend affine-linearly on parameters that vary inside prescribed intervals. The solvers use our C++ class library for reliable computing, C-XSC. The C-XSC library provides many features, especially easy to handle data types for dense and sparse matrices and vectors and the ability to compute dot products and dot product expressions in arbitrary precision. The new solvers can use either sparse or dense matrices as the coefficient matrices for the parameters. The use of sparse coefficient matrices can result in huge improvements in both performance and memory consumption. BLAS and LAPACK routines are used where applicable, and OpenMP is used for the parallelization on multi-core and multi-processor systems. The solvers also provide the ability to compute not only an outer but also a componentwise inner enclosure of the solution set of the system and to choose between two versions of the algorithm, one being very fast and one giving sharp results and extending the range of solvable systems. We give some examples for parametric linear systems (also from real world examples such as worst-case tolerance analysis of linear electric circuits), give performance measurements of our solvers and also demonstrate that they scale very well when using multiple cores or processors.
Direct volume rendering is a popular technique for scientific visualization. The computation cost of direct volume rendering increases exponentially as the size of the volume dataset increases. Hence, efficient volume...
详细信息
Direct volume rendering is a popular technique for scientific visualization. The computation cost of direct volume rendering increases exponentially as the size of the volume dataset increases. Hence, efficient volume rendering has become an important issue. In this work, we study parallel volume rendering algorithms based on sparse data structures. In order to exploit object space coherence, we propose to employ two sparse-matrix representation schemes as spatial datastructures. To further reduce the processing time, we employ data-parallel volume rendering algorithms based on sparse data structures. Two distinct features of our work are: (a) the sparse data structures enable us to reduce the processing time as well as the memory storage requirement;and (b) parallel processing allows us to further speed up the volume rendering process. Experiments were conducted to assess our proposed scheme. Results show that our proposed data parallel algorithms performed well on two different parallel distributed memory systems.
暂无评论