A stabilized parallel algorithm for direct-form recursive filters is obtained using a new method of derivation in the Z domain. The algorithm is regular and modular, so very efficient VLSI architectures can be constru...
详细信息
A stabilized parallel algorithm for direct-form recursive filters is obtained using a new method of derivation in the Z domain. The algorithm is regular and modular, so very efficient VLSI architectures can be constructed to implement it. The degree of parallelism in these implementations can be chosen freely, and is not restricted to be a power of two.
All sequential algorithms for sampling select items one at a time, making updates to a data structure after each selection. None lends itself to a straightforward parallelization. A fast sampling algorithm, SAMPLE, ...
详细信息
All sequential algorithms for sampling select items one at a time, making updates to a data structure after each selection. None lends itself to a straightforward parallelization. A fast sampling algorithm, SAMPLE, is presented that works on the concurrent read, exclusive write parallel random access machine. SAMPLE is a parallel algorithm for drawing an unbiased random sample of M items from a population of size N, where N is typically much larger than M. It relies on the sorting permutation and its inverse. The sorted array and the permutation of the inverse are simultaneously obtained by sorting the tuples. When the population and the random sample are equal, SAMPLE generates a random permutation of N elements in O(log M) time with M processors. It uses O(M) space. The random numbers used in the algorithm may be generated by using a parallel pseudorandom number generator. The loop bounds in SAMPLE are also examined, yielding a theorem.
The medial axis transform (MAT) is an image representation scheme. For a binary image, the MAT is defined as a set of upright maximal squares which consist of pixels of value 1 entirely. The MAT plays an important rol...
详细信息
The medial axis transform (MAT) is an image representation scheme. For a binary image, the MAT is defined as a set of upright maximal squares which consist of pixels of value 1 entirely. The MAT plays an important role in image understanding. This paper presents a parallel algorithm for computing the MAT of an n x n binary image. We show that the algorithm can be performed in O(log n) time using n(2)/log n processors on the EREW PRAM and in O(log log n) time using n(2)/log log n processors on the common CRCW PRAM. We also show that the algorithm can be performed in O(n(2)/p(2) + n) time on a p x p mesh and in O(n(2)/p(2) + (n log p)/p) time on a p(2) processor hypercube (for 1 less than or equal to p less than or equal to n). The algorithm is cost optimal on the PRAMs, on the mesh (For 1 less than or equal to p less than or equal to root n) and on the hypercube(for 1 less than or equal to p less than or equal to n/log n).
Suppose than 0 = n - k. (Thus for eta = 0 we get the well-known Chvata graphs.) An NC(4)-algorithm is presented which accepts as input an eta-Chvatal graph and produces a Hamiltonian cycle in G as an output. This is a...
详细信息
Suppose than 0 < eta < 1 is given. We call a graph, G, on n vertices an eta-Chvatal graph if its degree sequences d(1) <= d(2) <= ... <= d(n) satisfies: for k < n/2, d(k) <= min {k + eta n. n/2} implies d(n-k-eta n) >= n - k. (Thus for eta = 0 we get the well-known Chvata graphs.) An NC(4)-algorithm is presented which accepts as input an eta-Chvatal graph and produces a Hamiltonian cycle in G as an output. This is a significant improvement on the previous best NC-algorithm for the problem, which finds a Hamiltonian cycle only in Dirac graphs (delta(G) >= n/2 where delta(G) is the minimum degree in G). (C) 2008 Elsevier B.V. All rights reserved.
Basetl on the finite element solution of the parametric varialional principle of elastic con/del problem, a corresponding parallel algorithm has been created bv utilizing the specialities of parallel computer and the ...
详细信息
Basetl on the finite element solution of the parametric varialional principle of elastic con/del problem, a corresponding parallel algorithm has been created bv utilizing the specialities of parallel computer and the architecture of concurrent processing in this paper. In this algorithm. the parallelisms have heen realized in the processes of creation and assembly of stiffness matrix, of the static condensation, of the solution of stresses and in many other aspects. The programme of this algorithm has been realized on ELXSI-6400 parallel computer of Xi'an Jiaotong University. The results of computation show that the computational time can be saved efficiently and it is an effective parallel algorithm for the analyses of contact problems.
The semiconductor manufacturing consists of a number of processes, and even a small fault occurring at any point can damage the product quality. The fast and accurate detection of such faults is essential to maintain ...
详细信息
The semiconductor manufacturing consists of a number of processes, and even a small fault occurring at any point can damage the product quality. The fast and accurate detection of such faults is essential to maintain high manufacturing yields. In this paper, we propose a parallel algorithm for fault detection in semiconductor manufacturing processes. The algorithm is a modification of the discord detection algorithm called HOT SAX, which adopted the SAX representation of time-series for efficient storage and computation. We first propose a sequential algorithm and then extend it to a parallel version. We evaluate our algorithm through experiments using the data obtained from a real-world semiconductor plasma etching process. As a result, our fault detection algorithm achieved 100 % accuracy without any false positive or false negative.
A fast parallel algorithm that can be used to find a satisfying truth assignment for a 2-CNF formula is proposed. The input to the algorithm is a formula that is the conjunction of a given number of clauses, each of ...
详细信息
A fast parallel algorithm that can be used to find a satisfying truth assignment for a 2-CNF formula is proposed. The input to the algorithm is a formula that is the conjunction of a given number of clauses, each of which is the disjunction of exactly 2 literals, over a given number of Boolean variables. The algorithm determines if the inputted formula is satisfiable, and, if so, it finds a truth assignment to the variables that satisfies the formula. The implementation of the algorithm on a concurrent-read concurrent-write parallel random access machine (CRCW PRAM) is described. The input data structures are: 1. the number of clauses, 2. the number of variables, and 3. an array of the length of the number of clauses, with the entry for each clause consisting of the indexes of the 2 literals that occur in the clause. Output data structures are: 1. a Boolean variable indicating if the formula is satisfiable, and 2. an array of a length equal to twice the number of variables.
The rapid growth of information in the digital world especially on the web, calls for automated methods of organizing the digital information for convenient access and efficient information retrieval. Topic modeling i...
详细信息
The rapid growth of information in the digital world especially on the web, calls for automated methods of organizing the digital information for convenient access and efficient information retrieval. Topic modeling is a branch of machine learning and probabilistic graphical modeling that helps in arranging the web pages according to their topical structure. The topic distribution over a set of documents (web pages) and the affinity of a document toward a specific topic can be revealed using topic modeling. Topic modeling algorithms are typically computationally expensive due to their iterative nature. Recent research efforts have attempted to parallelize specific topic models and are successful in their attempts. These parallel algorithms however have tightly-coupled parallel processes which require frequent synchronization and are also tightly coupled with the underlying topic model which is used for inferring the topic hierarchy. In this paper, we propose a parallel algorithm to infer topic hierarchies from a large scale document corpus. A key feature of the proposed algorithm is that it exploits coarse grained parallelism and the components running in parallel need not synchronize after every iteration, thus the algorithm lends itself to be implemented on a geographically dispersed set of processing elements interconnected through a network. The parallel algorithm realizes a speed up of 53.5 on a 32-node cluster of dual-core workstations and at the same time achieving approximately the same likelihood or predictive accuracy as that of the sequential algorithm, with respect to the performance of Information Retrieval tasks. (C) 2015 Elsevier Ltd. All rights reserved.
A suffix tree is widely adopted for indexing genome sequences. While supporting highly efficient search, the suffix tree has a few shortcomings such as very large size and very long construction time. In this paper, w...
详细信息
A suffix tree is widely adopted for indexing genome sequences. While supporting highly efficient search, the suffix tree has a few shortcomings such as very large size and very long construction time. In this paper, we propose a very fast parallel algorithm to construct a disk-based suffix tree for human genome sequences. Our algorithm constructs a suffix array for part of the suffixes in the human genome sequence and then converts it into a suffix tree very quickly. It outperformed the previous algorithms by Loh et al. and Barsky et al. by up to 2.09 and 3.04 times, respectively.
This paper presents a parallel algorithm that computes the breadth-first search (BFS) numbering of a directed graph in O(log super(2)n) time using M(n) processors on the exclusive-read exclusive-write (EREW) parallel ...
详细信息
This paper presents a parallel algorithm that computes the breadth-first search (BFS) numbering of a directed graph in O(log super(2)n) time using M(n) processors on the exclusive-read exclusive-write (EREW) parallel random access machine (PRAM) model, where M(n) denotes the number of processors needed to multiply two n x n integer matrices over the ring (Z, +, X) in O(log n) time. The best known bound for M(n) is O(n super(2.376)) (Coppersmith and Winograd, 1987). The algorithm presented in their paper uses fewer processors than the classical algorithm for BFS that employs matrix powering over the semiring (dioid) (N, min, +), using O(log n) time and O(n super(3)) processors on the concurrent-read concurrent-write (CRCW) model, or using O(log super(2) n) time and n super(3)/log n processors on the EREW model.
暂无评论