Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programming algorithms. The purpose of this paper is to study a family of dyn...
详细信息
ISBN:
(纸本)9781595936677
Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. There have been many parallel dynamic programming algorithms. The purpose of this paper is to study a family of dynamic programming algorithm where data dependence appear between non-consecutive stages, in other words, the data dependence is non-uniform. This kind of dynamic programming is typically called nonserial polyadic dynamic programming. Owing to the: non-uniform data dependence;it is harder to optimize this problem for parallelism and locality on parallelarchitectures. In this paper, we address the chanllenge of exploiting fine grain parallelism and locality of nonserial polyadic dynamic programming on a multi-core architecture. We present a programming and execution model for multi-core architectures with memory hierarchy. In the framework of the new model, the parallelism and locality benifit from a data dependence transformation. We propose a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators. The new parallel algorithm tolerates the memory access latency using multi-thread and is easily improved with the technique. We formulate and analytically solve the optimization problem determing the the size that minimizes the total execution time. The experiments on a simulator give a validation of the proposed model and show that the fine grain parallel algorithm achieves sub-linear speedup and that a potential high scalability on multi-core arichitecture.
We prove a number of negative results about practical (i.e., numerically accurate) algorithms for certain matrix factorizations. In particular, we prove that the popular Givens' method for computing the QR decompo...
详细信息
ISBN:
(纸本)9780897918909
We prove a number of negative results about practical (i.e., numerically accurate) algorithms for certain matrix factorizations. In particular, we prove that the popular Givens' method for computing the QR decomposition is inherently sequential over the realistic model of floating point arithmetic. We also prove a number of additional results concerning Gaussian Elimination for computing the LU decomposition. Altogether, the results of this paper support the widespread belief that there is a tradeoff between parallelism and accuracy in numerical algorithms.
A library, called PAD, of basic parallelalgorithms and data structures for the PRAM is currently being implemented using the PRAM programming language Fork95. Main motivations of the PAD project is to study the PRAM ...
详细信息
A library, called PAD, of basic parallelalgorithms and data structures for the PRAM is currently being implemented using the PRAM programming language Fork95. Main motivations of the PAD project is to study the PRAM as a practical programming model, and to provide an organized collection of basic PRAM algorithms for the SB-PRAM under completion at the University of Saarbruecken. We give a brief survey of Fork95, and describe the main components of PAD. Finally we report on the status of the language and library and discuss further developments.
Scheduling problems that are critical and prevalent in practical parallel computing are computed. A polynomial time makespan algorithm that produces a schedule of length O(V+Φ log T), which is therefore an O(log T) a...
详细信息
ISBN:
(纸本)9780897918091
Scheduling problems that are critical and prevalent in practical parallel computing are computed. A polynomial time makespan algorithm that produces a schedule of length O(V+Φ log T), which is therefore an O(log T) approximation is presented to solve these problems. The makespan algorithm can be extended to minimize the weighted average completion time over all the jobs to the same approximation factor of O(log T).
作者:
Koo, JaehyunMIT
77 Massachusetts Ave Cambridge MA 02139 USA
We present an O(1)-round fully-scalable deterministic massively parallel algorithm for computing the min-plus matrix multiplication of unit-Monge matrices. We use this to derive a O(log n)-round fully-scalable massive...
详细信息
ISBN:
(纸本)9798400704161
We present an O(1)-round fully-scalable deterministic massively parallel algorithm for computing the min-plus matrix multiplication of unit-Monge matrices. We use this to derive a O(log n)-round fully-scalable massively parallel algorithm for solving the exact longest increasing subsequence (LIS) problem. For a fully-scalable MPC regime, this result substantially improves the previously known algorithm of O(log(4) n)-round complexity, and matches the best algorithm for computing the (1 + epsilon)-approximation of LIS.
作者:
AnonACM
Special Interest Group for Automata and Computability Theory New York NY USA ACM Special Interest Group for Automata and Computability Theory New York NY USA
This conference proceedings contains 50 papers. The topics covered include: algorithms;graph theory;parallel processing;automata;boolean circuits;computational geometry;cryptography;complexity;distributed computing.
This conference proceedings contains 50 papers. The topics covered include: algorithms;graph theory;parallel processing;automata;boolean circuits;computational geometry;cryptography;complexity;distributed computing.
暂无评论