This paper discusses preconditioners for the Conjugate Gradient Method which are based on splittings of the system matrix. Conditions for the convergence are given, and particular splittings are chosen in order to imp...
详细信息
This paper discusses preconditioners for the Conjugate Gradient Method which are based on splittings of the system matrix. Conditions for the convergence are given, and particular splittings are chosen in order to implement the method on a distributed memory multiprocessor.
Let Ax = b be a linear system where A is a symmetric positive definite matrix. Preconditioners for the conjugate gradient method based on multisplittings obtained by incomplete Choleski factorizations of A are studied...
详细信息
Let Ax = b be a linear system where A is a symmetric positive definite matrix. Preconditioners for the conjugate gradient method based on multisplittings obtained by incomplete Choleski factorizations of A are studied. The validity of these preconditioners when A is an M-matrix is proved and a parallel implementation is presented.
Let C = {c1,...,c(m)} be a family of subsets of a finite set S = {1,..., n}, a subset S' of S is a co-hitting set if S' contains no element of C as a subset. By using an O ( (log n) 2) time EREW PRAM algorithm...
详细信息
Let C = {c1,...,c(m)} be a family of subsets of a finite set S = {1,..., n}, a subset S' of S is a co-hitting set if S' contains no element of C as a subset. By using an O ( (log n) 2) time EREW PRAM algorithm for a maximal independent set problem (MIS), we show that a maximal co-hitting set for S can be computed on an EREW PRAM in time O (alphabeta (log (n + m))2) using 0 (n2m) processors, where alpha = max{\c(i)\\i = 1,...,n} and beta =max{\d(j)\\ j = 1,...n} with d(j)={c(i)\j is-an-element-of c(i)}. This implies that if alphabeta = O((log(n+m))k) then the problem is solvable in NC.
We simulate ballistic particle deposition wherein a large number of spherical particles are “dropped” vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited ...
详细信息
ISBN:
(纸本)9780818671203
We simulate ballistic particle deposition wherein a large number of spherical particles are “dropped” vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation [1]. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins [2]. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to [3], [4], [5]. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers two orders of magnitude faster than an optimized sequential code runs on a fast workstation.
Given G =(V,E) is a simple planar graph,and it doesn't contain any odd loops,|V| = n,|E| = *** this paper,we propose an efficient parallel algorithm for edge-coloring by using A colors,based on SIMD-CRCW PRAM,a ...
详细信息
ISBN:
(纸本)0780312333
Given G =(V,E) is a simple planar graph,and it doesn't contain any odd loops,|V| = n,|E| = *** this paper,we propose an efficient parallel algorithm for edge-coloring by using A colors,based on SIMD-CRCW PRAM,a kind of shared memory model that many processors can read and write a unit ***, A is the maximum degree of vertices of *** A is an even mimber,the algorithm requires O(log△·log n) time and O(n△) processors;otherwise it requires O(log△? log n +△n) time and O(n△) ***△= O(log(1)n),the algorithm is an efficient algorithm.
Shape recognition is an important research area in pattern recognition. It also has wide practical applications in many fields. An attribute grammar approach to shape recognition combines both the advantages of syntac...
详细信息
Shape recognition is an important research area in pattern recognition. It also has wide practical applications in many fields. An attribute grammar approach to shape recognition combines both the advantages of syntactic and statistical methods and makes shape recognition more accurate and efficient. However, the time complexity of a sequential shape recognition algorithm using attribute grammar is O(n(3)) where n is the length of an input string. When the problem size is very large it needs much more computing time, therefore a high speed parallel shape recognition is necessary to meet the demands of some real-time applications. This paper presents a parallel shape recognition algorithm and also discusses the algorithm partition problem as well as its implementation on a fixed-size VLSI architecture. The proposed algorithm has time complexity O(n(3)/k(2)) if using k x k processing elements. When k = n, its time complexity is O(n). The experiment has been conducted to verify the performance of the proposed algorithm. The correctness of the algorithm partition and the behavior of the proposed VLSI architecture have also been proved through the experiment. The results indicate that the proposed algorithm and the VLSI architecture could be very useful to imaging processing, pattern recognition and related areas, especially for real-time applications.
This paper shows that the prefix-sums of n binary values can be computed in time on an n × m reconfigurable mesh of the word model. It also shows that prefix-sums of n binary values can be computed in time on an ...
详细信息
This paper shows that the prefix-sums of n binary values can be computed in time on an n × m reconfigurable mesh of the word model. It also shows that prefix-sums of n binary values can be computed in time on an n × m reconfigurable mesh of the word model if the reconfigurable mesh has communication capability that allows simultaneous sending to the same bus.
We present a simple systolic algorithm for implementing dictionary machine based on the VLSI technology. Our design makes use of a dynamic. global tree rebalancing scheme to attain high system throughput. Our scheme i...
详细信息
We present a simple systolic algorithm for implementing dictionary machine based on the VLSI technology. Our design makes use of a dynamic. global tree rebalancing scheme to attain high system throughput. Our scheme is simple to implement and requires low sophistication in the design of processing nodes. Results from analysis and simulation show that our algorithm has optimal response time and achieves an average latency close to 1. This represents a significant improvement over many of the previous designs. Unlike most parallel dictionary machines reported in the literature, our approach requires no compression operations.
Given a pattern of length m and a text of length n, commonly m much less than n, this paper presents a randomized parallel algorithm for pattern matching in O(n(1/10)) (=O(n(1/10) + (n - m)(1/10))) time on a newly pro...
详细信息
Given a pattern of length m and a text of length n, commonly m much less than n, this paper presents a randomized parallel algorithm for pattern matching in O(n(1/10)) (=O(n(1/10) + (n - m)(1/10))) time on a newly proposed n(3/5) x n(2/5) modular mesh-connected computers with multiple buses. Furthermore, the time bound of our parallel algorithm can be reduced to O(n(1/11)) if fewer processors are used.
In this paper we introduce a class of trees, called generalized compressed trees. Generalized compressed trees can be derived from complete binary trees by performing certain 'contraction' operations. A genera...
详细信息
In this paper we introduce a class of trees, called generalized compressed trees. Generalized compressed trees can be derived from complete binary trees by performing certain 'contraction' operations. A generalized compressed tree CT of height h has approximately 25% fewer nodes than a complete binary tree T of height h. We show that these trees have smaller (up to a 74% reduction) 2-dimensional and 3-dimensional VLSI layouts than the complete binary trees. We also show that algorithms initially designed for T can be simulated by CT with at most a constant slow-down. In particular, algorithms having non-pipelined computation structure and originally designed for T can be simulated by CT with no slow-down.
暂无评论