Background: Accurate and efficient RNA secondary structure prediction remains an important open problem in computational molecular biology. Historically, advances in computing technology have enabled faster and more a...
详细信息
Sparse general matrix-matrix multiplication, SpGEMM, is one of the most fundamental yet challenging sparse computation kernels. Due to its irregular computation pattern, SpGEMM frequently becomes the performance bottl...
详细信息
ISBN:
(纸本)9781665410168
Sparse general matrix-matrix multiplication, SpGEMM, is one of the most fundamental yet challenging sparse computation kernels. Due to its irregular computation pattern, SpGEMM frequently becomes the performance bottleneck in many scientific applications. Many prior state-of-the-art approaches use either dense or sparse accumulators to merge matrix rows as a critical component. Dense accumulators are efficient for small matrices but are infeasible for large or highly sparse matrices, due to high memory use and low cache efficiency. In this work, by segmenting the columns for the second input matrix, we propose a new SpGEMM algorithm that utilizes both a new sparse high-level overview of the matrix and fast and small dense accumulators that would fit in cache. With that, our approach brings the dense accumulator benefits to both large and highly sparse matrices. Our extensive experimental evaluation, carried out on three hardware platforms and on hundreds of sparse matrices from a variety of domains, shows that our algorithm out-performs state-of-the-art SpGEMM implementations.
In this work, we contribute a parallel implementation of the network simplex algorithm that is used for the solution of minimum cost flow problem. In the network simplex algorithm, finding an entering arc requires sea...
详细信息
In this work, we contribute a parallel implementation of the network simplex algorithm that is used for the solution of minimum cost flow problem. In the network simplex algorithm, finding an entering arc requires searching through many arcs to decide which one should be included in the spanning tree solution on the next iteration. We propose finding the entering arc in parallel as it often takes the majority of the execution time. A usual strategy is to pick the arc violating the optimality the most out of all possible candidates. Scanning all arcs can take quite some time, so it is common to consider only a fixed number of arcs which is referred as the block search pivoting rule. Arc scans can easily be done in parallel to find the best candidate as the calculations are independent of each other. We used shared memory parallelism using OpenMP along with vectorization using AVX instructions. We also tried adjusting block sizes to increase the parallel portion of the algorithm. Our dataset consists of various natural and synthetic graphs with sizes up to a billion arc. Our experiments show speedups up to four are possible, though they are typically lower.
Grid-based spatially distributed hydrological modeling has become feasible with advances in watershed routing schemes, remote sensing technology, and computing resources. However, the need for long-running times on a ...
详细信息
Grid-based spatially distributed hydrological modeling has become feasible with advances in watershed routing schemes, remote sensing technology, and computing resources. However, the need for long-running times on a substantial set of computational resources prevents a spatially detailed modeling program from being widely used, particularly in fine-resolution large-scale studies. Parallelizing computational tasks successfully mitigate this difficulty. We propose a novel way to improve the simulation efficiency of direct runoff transport processes by grouping watershed areas based on a time-area routing scheme. The proposed parallelization method was applied to simulating the runoff routing processes of two watersheds in different sizes and landscapes. The method substantially improved the computational efficiency of the time-area routing simulation with common computing resources. The efficiency of the parallelization was not limited by the hierarchical relationship between upstream and downstream catchments along the flow paths, which could be possible with the Lagrangian tracking of the time-area routing method.
OpenMP cannot handle some very common programming idioms like recursive control and list or tree data structures. We present the workqueuing model and show it as a natural, flexible, and easy to use extension of OpenM...
详细信息
暂无评论