The divergence in the computer architecture landscape has resulted in different architectures being considered mainstream at the same time. For application and algorithm developers, a dilemma arises when one must focu...
详细信息
ISBN:
(纸本)9781479986484
The divergence in the computer architecture landscape has resulted in different architectures being considered mainstream at the same time. For application and algorithm developers, a dilemma arises when one must focus on using underlying architectural features to extract the best performance on each of these architectures, while writing portable code at the same time. We focus on this problem with graph analytics as our target application domain. In this paper, we present an abstraction-based methodology for performance-portable graph algorithm design on manycore architectures. We demonstrate our approach by systematically optimizing algorithms for the problems of breadth-first search, color propagation, and strongly connected components. We use Kokkos, a manycore library and programming model, for prototyping our algorithms. Our portable implementation of the strongly connected components algorithm on the NVIDIA Tesla K40M is up to 3.25x faster than a state-of-the-art parallel CPU implementation on a dual-socket Sandy Bridge compute node.
In this work we examine a new technique for approximate string matching using Markovian distance. Here we assume each character appears in a probabilistic way. By means of this idea, we introduce a notion of dissimila...
详细信息
A parallel subspace iteration method for solving eigenvalue problem of based on multi-core platform is presented, which can solve several extreme eigenpair in parallel. Compared with Jacobi-Davidson method, the dimens...
详细信息
Optimal multiple sequence alignment by dynamic programming, like many highly dimensional scientific computing problems, has failed to benefit from the improvements in computing performance brought about by multi-proce...
详细信息
ISBN:
(纸本)9780769534718
Optimal multiple sequence alignment by dynamic programming, like many highly dimensional scientific computing problems, has failed to benefit from the improvements in computing performance brought about by multi-processor systems, due to the lack of suitable scheme to manage partitioning and dependencies. A scheme for parallel implementation of the dynamic programming multiple sequence alignment is presented, based on a peer to peer design and a multidimensional array indexing method. This design results in up to 5-fold improvement compared to a previously described master/slave design, and scales favourably with the number of processors used. This study demonstrates an approach for parallelising multi-dimensional dynamic programming and similar algorithms utilizing multi-processor architectures.
By taking into account communication startup overhead and the assigned processor distribution order and by applying hashing technique, a novel sequence distribution strategy is presented and the parallel local alignme...
详细信息
Cryptosystem on conic curves, which is a new developing cryptography, becomes more widespread in these days. It is important to explore fast parallel algorithms to both encrypt and decrypt information in conic curves ...
详细信息
Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffer...
详细信息
ISBN:
(纸本)9781450301190
Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffers heavily when the graph structure is highly irregular, as most real-world graphs tend to be. In this study, we first observe that the poor performance is caused by work imbalance and is an artifact of a discrepancy between the GPU programming model and the underlying GPU architecture. We then propose a novel virtual warp-centric programming method that exposes the traits of underlying GPU architectures to users. Our method significantly improves the performance of applications with heavily imbalanced workloads, and enables trade-offs between workload imbalance and ALU underutilization for fine-tuning the performance. Our evaluation reveals that our method exhibits up to 9x speedup over previous GPU algorithms and 12x over single thread CPU execution on irregular graphs. When properly configured, it also yields up to 30% improvement over previous GPU algorithms on regular graphs. In addition to performance gains on graph algorithms, our programming method achieves 1.3x to 15.1x speedup on a set of GPU benchmark applications. Our study also confirms that the performance gap between GPUs and other multi-threaded CPU graph implementations is primarily due to the large difference in memory bandwidth.
This book constitutes the refereed proceedings of the 8th internationalsymposium on parallel Architecture, Algorithm and programming, PAAP 2017, held in Haikou, China, in June 2017. The 50 revised full papers and 7 r...
ISBN:
(数字)9789811064425
ISBN:
(纸本)9789811064418
This book constitutes the refereed proceedings of the 8th internationalsymposium on parallel Architecture, Algorithm and programming, PAAP 2017, held in Haikou, China, in June 2017. The 50 revised full papers and 7 revised short papers presented were carefully reviewed and selected from 192 submissions. The papers deal with research results and development activities in all aspects of parallelarchitectures, algorithms and programming techniques.
The coordination style programming language T-Cham extends chemical abstract machine (Cham) with transactions. The Cham is an interactive computational model based on chemical reaction metaphor, where a computation pr...
详细信息
The coordination style programming language T-Cham extends chemical abstract machine (Cham) with transactions. The Cham is an interactive computational model based on chemical reaction metaphor, where a computation proceeds as a succession of chemical reactions. A transaction is a piece of sequentially executed codes and could be written in any language, such as C, Pascal, or Fortran etc., as long as it satisfies its pre-condition and post-condition. Every transaction begins its execution whenever its execution condition is satisfied. A T-Cham program can be executed in a parallel, distributed, or sequential manner based on the available computer resources.
暂无评论