Removing redundant edges on a large graph is a fundamental problem in many practical applications such as verification of real-time systems and network routing. In this paper, we present the designs of scalable and ef...
详细信息
ISBN:
(纸本)9781424465347
Removing redundant edges on a large graph is a fundamental problem in many practical applications such as verification of real-time systems and network routing. In this paper, we present the designs of scalable and efficient parallel algorithms for multiple many-core GPU devices using CUDA. Our algorithms expose substantial fine-grained parallelism while maintaining minimal global communication. By using the global scope of the GPU's global memory, coalescing the global memory reads and writes, and avoiding on-chip shared memory bank conflicts, we are able to achieve a large performance benefit with a speed-up of 2,500x on a desktop computer in comparison with a single core CPU program. We report our experiments on large graphs with up to 29K vertices using multiple GPU devices.
We present sequential and parallel algorithms for Frontier A* (FA*) algorithm augmented with a form of Delayed Duplicate Detection (DDD). The sequential algorithm, FA*-DDD, overcomes the leak-back problem associated w...
详细信息
We revisit and use the dependence transformation method to generate parallel algorithms suitable for cluster and grid computing. We illustrate this method in two applications: to obtain a systolic matrix product algor...
详细信息
Recent advances in the design of efficient parallel algorithms have been largely focusing on the nowadays classical model of parallel computing called Massive parallel Computation (MPC), which follows the framework of...
详细信息
In this paper we design and analyse parallel algorithms with the goal to get exact bounds on their speed-ups on real machines. For this purpose we define an extension of Valiant's BSP model, BSP*, that rewards blo...
详细信息
In this paper, an efficient parallel algorithm is proposed for finding a k-tree core of a tree network. The proposed algorithm performs on the EREW PRAM in O(log n log* n) time using O(n) work.
In this paper, an efficient parallel algorithm is proposed for finding a k-tree core of a tree network. The proposed algorithm performs on the EREW PRAM in O(log n log* n) time using O(n) work.
A visualization model has been developed to analyse the performance of a massively parallel algorithm. Most visualization tools that have been developed so far for performance analysis are based generally on individua...
详细信息
For a given algorithm, the energy consumed in executing the algorithm has a nonlinear relationship with performance. In case of parallel algorithms, energy use and performance are functions of the structure of the alg...
详细信息
We take advantage of the new tasking features in OpenMP to propose advanced task-parallel algorithms for the inversion of dense matrices via Gauss-Jordan elimination. Our algorithms perform a partitioning of the matri...
详细信息
The problem of partitioning task graphs in its general form is known to be NP-complete and it is extremely difficult to come up with simple but effective and fast heuristics too. In this paper, the tree task graphs ar...
详细信息
暂无评论