Antonio, Tsai, and Huang proposed a scheme in 1991 to parallelize the standard dynamic programming approach to solve combinatorial multistage problems. However, their dynamic programming approach is restricted to thos...
详细信息
Antonio, Tsai, and Huang proposed a scheme in 1991 to parallelize the standard dynamic programming approach to solve combinatorial multistage problems. However, their dynamic programming approach is restricted to those multistage problems where the decision made at each stage depends only on decisions made in the stage immediately preceding it. For many interesting problems the decision at each stage depends on the decisions made at all the previous stages, and therefore their approach doesn't apply. The Matrix Chain Multiplication problem, Longest Common Subsequence problem, and Optimal Polygon Triangulation problem are some examples of such problems. We also present techniques for parallelizing the dynamic programming solution to such problems. The parallel algorithm we develop for a PRAM has complexity Theta(n) employing Theta(n(2)) processors. Since the traditional sequential algorithm for such problems is Theta(n(3)), our parallel. algorithm is an optimal parallel algorithm based on this traditional algorithm. We also describe the results of our experiments that are in conformity with our theoretical complexity results. We also compare and contrast our result with results obtained by earlier researchers and show that our parallel algorithm has optimal efficiency of 100% with respect to the traditional Dynamic Programming algorithm.
Many combinatorial problems can be efficiently solved for partial k-trees (graphs of treewidth bounded by k). The edge-coloring problem is one of the well-known combinatorial problems for which no NC algorithms have b...
详细信息
Many combinatorial problems can be efficiently solved for partial k-trees (graphs of treewidth bounded by k). The edge-coloring problem is one of the well-known combinatorial problems for which no NC algorithms have been obtained for partial k-trees. This paper gives an optimal and first NC parallel algorithm to find an edge-coloring of any given partial k-tree with bounded degrees using a minimum number of colors. In the paper k is assumed to be bounded.
In this paper the relation between the matrix equation approach and the eigenspace approach of multi-time-scale decomposition for large-scale linear systems is analysed. Based on the synthesis of these two approaches,...
详细信息
In this paper the relation between the matrix equation approach and the eigenspace approach of multi-time-scale decomposition for large-scale linear systems is analysed. Based on the synthesis of these two approaches, a new iteration method based on dominant sub-eigenspace is presented and a parallel algorithm is derived. This new method has parallel characteristics and reserves the original structural property for some special systems.
A number of parallel algorithms based on higher-order Pade approximations are developed for the numerical solution of a mathematical model of percutaneous drug absorption. These methods are L(0)-stable and require onl...
详细信息
A number of parallel algorithms based on higher-order Pade approximations are developed for the numerical solution of a mathematical model of percutaneous drug absorption. These methods are L(0)-stable and require only the application of tridiagonal solvers with complex arithmetic.
Frequently, one needs to evaluate expressions of the form [p(A)](-1)q(A)b, where A is an element of R(NxN), b is an element of R(N), and p and q are polynomials with degree q less than or equal to degree p, and such t...
详细信息
Frequently, one needs to evaluate expressions of the form [p(A)](-1)q(A)b, where A is an element of R(NxN), b is an element of R(N), and p and q are polynomials with degree q less than or equal to degree p, and such that no zero of p is an eigenvalue of A. algorithms based on the partial fraction representation of q/p when evaluating [p(A)](-1) q(A)b lend themselves well to implementation on a parallel computer, but might yield poor accuracy. We discuss how to determine an incomplete partial fraction representation of q/p which allows parallel computation, while retaining high accuracy.
Fast Fourier transform (FFT), which has wide and variety application areas, requires very high speed computation. Since parallel processing of FFT is very attractive for high speed FFT computation, many processor arra...
详细信息
Fast Fourier transform (FFT), which has wide and variety application areas, requires very high speed computation. Since parallel processing of FFT is very attractive for high speed FFT computation, many processor arrays and multiprocessor systems have been proposed with efficient FFT algorithms. As a result of the recent development of VLSI technology, several massively parallel computers have been implemented on commercial basis. The MasPar, which is one of the SIMD type massively parallel computers, consists of an eight-neighbor processor array. This paper discusses parallel 1-D FFT algorithms on an eight-neighbor processor array. We propose three algorithms according to various data allocation methods. Then we estimate and evaluate their processing time. With the number of processors N = N(r) x N(r), processing time is estimated to be 2(N(r) - 2)t(c) + (log2N(r))t(b), where t(c) is the communication time between neighbor processors, and t(b) is the execution time for the radix 4 butterfly computation. We also compare these algorithms with the conventional radix 2 FFT algorithm implemented on a mesh processor array. It is shown that the radix 4 FFT algorithms are faster than the radix 2 algorithms. These algorithms get high speed FFT computation by combining the radix 4 FFT algorithm with the characteristics of the eight-neighbor processor array.
A parallel algorithm for shape recognition is presented along with its implementation on a distributed memory multiprocessor. Shape recognition is one of the fundamental problems of computer vision. We consider a shap...
详细信息
A parallel algorithm for shape recognition is presented along with its implementation on a distributed memory multiprocessor. Shape recognition is one of the fundamental problems of computer vision. We consider a shape to be composed of a set of small straight line segments tangential to the object. The recognition problem is to determine whether the test image contains a specified reference shape or not. The straight line Hough transform [SLHT] has been used to detect reference shapes. A signature-based parallel algorithm called SHARP is developed for shape recognition using SLHT on a distributed memory multiprocessor system. in the SHARP algorithm, the (theta, r) space is divided among processors. The SHARP algorithm has been implemented on a Meiko transputer with 32 nodes. We analyse the performance of the parallel algorithm using both theoretical and experimental techniques.
The mutual range-join of k sets, S-1, S-2, ... S-k, is the set containing all tuples (s(1), s(2), ... s(k)) that satisfy e(1) less than or equal to \s(i) - s(j)\ less than or equal to e(2) for all 1 less than or equal...
详细信息
The mutual range-join of k sets, S-1, S-2, ... S-k, is the set containing all tuples (s(1), s(2), ... s(k)) that satisfy e(1) less than or equal to \s(i) - s(j)\ less than or equal to e(2) for all 1 less than or equal to i not equal j less than or equal to k, where s(i) is an element of S-i and e(1) less than or equal to e(2) are fixed constants. This paper presents an efficient parallel algorithm for computing the k-set mutual range-join in hypercube computers. The proposed algorithm uses a fast method to determine whether the differences of all pair numbers among k given numbers are within a given range and applies the technique of permutation-based range-join [11]. To compute the mutual range-join of k sets S-1, S-2, ... S-k in a hypercube of p processors with O(Sigma(i=1)(k)n(i)/p) local memory, p less than or equal to \S-i\ = n(i) and 1 less than or equal to i less than or equal to k, our algorithm requires at most O((k log k/p)Pi(i=1)(k)n(i)) data comparisons in the worst case. The algorithm is implemented in PVM and its performance is extensively evaluated on various input data.
Perfect elimination schemes (p.e.s.) occur in a number of important problems such as perfect Gaussian elimination. The main objective of this paper is to study the parallel computation of p.e.s, of a triangulated or p...
详细信息
Perfect elimination schemes (p.e.s.) occur in a number of important problems such as perfect Gaussian elimination. The main objective of this paper is to study the parallel computation of p.e.s, of a triangulated or perfect elimination graph G = (V, E), with n = /V/ vertices. We start with the notion of partitioning a triangulated graph into a set of (mutually disjoint) adjacency-level sets and we present a parallel algorithm, based mainly on the properties of the adjacency-level sets, which computes a p.e.s. in time O(log L . log H) using L . H . n(2) processors on a CRCW-PRAM. The computation of the adjacency-level sets of a triangulated graph can be done in time O(log L) with L . H . n(2) processors within the same type of computational model. Here, L < n and H < n are the length and the height of the graph, respectively.
In this paper we propose a VLSI implementable architecture called Cube Connected Tree having advantageous properties of both tree and hypercube. This structure has a fixed low degree of nodes for any size of the netwo...
详细信息
In this paper we propose a VLSI implementable architecture called Cube Connected Tree having advantageous properties of both tree and hypercube. This structure has a fixed low degree of nodes for any size of the network unlike the hypercube where the node degree is dependent on the size of the hypercube. The degree-diameter product metric [26] of CCT is low compared to that of a hypercube of comparable size. It overcomes the data congestion problem near the root of the binary tree by having multiple roots in the structure, thereby enhancing the I/O bandwidth of the system. The complexity of the VLSI layout of this structure has been addressed within the grid model of Thompson [12]. By using spare links and PEs, fault tolerance capabilities of the system have been enhanced. Easy programmability of this structure has been demonstrated by designing polylogarithmic algorithms for sorting and discrete Fourier transform.
暂无评论