A mapping procedure for synthesizing uniform recurrence equations from the dynamic programming formulation of the knapsack problem is proposed. Two new systolic arrays are synthesized from the systems of recurrence eq...
详细信息
With the recent development of semiconductor integration technology, the amount of data that must be handled in the layout design of VLSI is increasing rapidly. Even if the improvement of the processing speed of the c...
详细信息
A subsequence of a given string is any string obtained by deleting none or some symbols from the given string. A longest common subsequence of two strings is a common subsequence of both that is as long as any other c...
详细信息
A subsequence of a given string is any string obtained by deleting none or some symbols from the given string. A longest common subsequence of two strings is a common subsequence of both that is as long as any other common subsequences. The longest common subsequence problem is to find the longest common subsequence of two given strings. The bound on the complexity of this problem under the decision tree model is known as mn if the number of distinct symbols that can appear in strings is infinite, where m and n are the lengths of the two strings, respectively, and m less-than-or-equal-to n. In this paper, we propose two parallel algorithms for this problem on the CREW-PRAM model. One takes O(log2 m + log n) time with mn/log m processors, which is faster than all the existing algorithms on the same model. The other takes O(log2 m log log m) time with mn/ log2 m log log m processors when log2 m log log m > log n, or otherwise O(log n) time with mn/ log n processors, which is optimal in the sense that the time x processors bound matches the complexity bound of the problem. Both algorithms exploit nice properties of the LCS problem that are discovered in this paper.
In [2], a parallel perceptron learning algorithm on the single-channel broadcast communication model was proposed to speed up the learning of weights of perceptrons [3]. The results in [2] showed that given n training...
详细信息
In [2], a parallel perceptron learning algorithm on the single-channel broadcast communication model was proposed to speed up the learning of weights of perceptrons [3]. The results in [2] showed that given n training examples, the average speedup is 1.48*n0.91/log n by n processors. Here, we explain how the parallelization may be modified so that it is applicable to any number of processors. Both analytical and experimental results show that the average speedup can reach nearly O(r) by r processors if r is much less than n.
A parallel adaptive-grid Navier-Stokes algorithm based on generic primitives has been developed. The parallel primitives are general for the class of explicit finite-volume Navier-Stokes numerical schemes. Furthermore...
详细信息
A parallel adaptive-grid Navier-Stokes algorithm based on generic primitives has been developed. The parallel primitives are general for the class of explicit finite-volume Navier-Stokes numerical schemes. Furthermore, they allowed relatively simple implementation of the algorithm on two different parallel systems;an eight-processor Cray Y-MP and the Connection Machine CM-2. A novel data structure for the adaptive grid allowed efficient parallel refinement/coarsening of the mesh. Substantial speeds compared to the corresponding sequential algorithm were realized on both systems.
It is well known that the availability of cost-effective and powerful parallel computers has enhanced the ability of the operations research community to solve laborious computational problems. But many researchers ar...
详细信息
It is well known that the availability of cost-effective and powerful parallel computers has enhanced the ability of the operations research community to solve laborious computational problems. But many researchers argue that the lack of portability of parallel algorithms is a major drawback to utilizing parallel computers. This paper studies the performance of a portable parallel unconstrained non-gradient optimization algorithm, when executed in various shared-memory multiprocessor systems, compared with its non-portable code. Analysis of covariance is used to analyse how the algorithm's performance is affected by several factors of interest. The results yield more insights into the parallel computing.
This paper describes a new parallel algorithm for solving n-job, m-machine flow-shop problems. The algorithm is basically a parallelization of the usual branch-and-bound method. It also takes advantage of all search m...
详细信息
This paper describes a new parallel algorithm for solving n-job, m-machine flow-shop problems. The algorithm is basically a parallelization of the usual branch-and-bound method. It also takes advantage of all search method to keep high efficiency of parallel processing, when the subproblem becomes smaller than certain size. It is shown that its implementation on both nCUBE2 and LUNA88k2 gives very good performance characteristics.
Adaptive local grid refinement/coarsening results in unequal distribution of work load among the processors of a parallel system. A novel method for balancing the load in cases of dynamically changing tetrahedral grid...
详细信息
Adaptive local grid refinement/coarsening results in unequal distribution of work load among the processors of a parallel system. A novel method for balancing the load in cases of dynamically changing tetrahedral grids is developed. The approach employs local exchange of cells among processors to redistribute the load equally. An important part of the load-balancing algorithm is the method employed by a processor to determine which cells within its subdomain are to be exchanged. Two such methods are presented and compared. The strategy for load balancing is based on the divide-and-conquer approach that leads to an efficient parallel algorithm. This method is implemented on a distributed-memory multiple instruction multiple data system.
In this paper we present an O(log n) time parallel algorithm for arithmetic expression evaluation, on an n x n processor array with reconfigurable bus system, where n is the sum of the number of operators and constant...
详细信息
In this paper we present an O(log n) time parallel algorithm for arithmetic expression evaluation, on an n x n processor array with reconfigurable bus system, where n is the sum of the number of operators and constants in the expression. The basic technique involved here is leaves-cutting (rake operation), as in the case of PRAM model algorithms available in the literature for this problem. The input to our algorithm is assumed to be the binary tree associated with a given expression (also known as expression tree with n number of nodes). Our algorithm is faster compared to the previous best time for expression evaluation on mesh connected computers which is O(root n).
作者:
UMLAND, TUniversität Karlsruhe
Informatik für Ingenieure und Naturwissenschaftler Postfach 6980 Am Fasanengarten 5 76128 Karlsruhe Germany
In this paper again some parallel sorting algorithms for the hypercube are described. They are compared to a recently published parallel sorting procedure for the same topology and it is shown that considering essenti...
详细信息
In this paper again some parallel sorting algorithms for the hypercube are described. They are compared to a recently published parallel sorting procedure for the same topology and it is shown that considering essential properties of the available hardware and its topology during the design of parallel algorithms is instructive.
暂无评论