In this paper, we first demonstrate that the classical Purcell's vector method when combined with row pivoting yields a consistently small growth factor in comparison to the well-known Gauss elimination method, th...
详细信息
In this paper, we first demonstrate that the classical Purcell's vector method when combined with row pivoting yields a consistently small growth factor in comparison to the well-known Gauss elimination method, the Gauss-Jordan method and the Gauss-Huard method with partial pivoting. We then present six parallel algorithms of the Purcell method that may be used for direct solution of linear systems. The algorithms differ in ways of pivoting and load balancing. We recommend algorithms V and VI for their reliability and algorithms III and IV for good load balance if local pivoting is acceptable. Some numerical results are presented. (C) 2002 Elsevier Science B.V. All rights reserved.
The geometric connected component labeling (GCCL) problem occurs as an important subproblem when parallel algorithms are sought for problems like VLSI circuit extraction. parallel algorithms for this problem on a hype...
详细信息
The geometric connected component labeling (GCCL) problem occurs as an important subproblem when parallel algorithms are sought for problems like VLSI circuit extraction. parallel algorithms for this problem on a hypercube multiprocessor can be designed by dividing the domain, consisting of a number of rectangles, into regions using a Slice or Rectangular partitioning Scheme. Each processor in the hypercube is assigned one partition. The processor determines the connected sets of rectangles in its partition. The connected sets at different processors have to then be combined across processors into globally connected sets. This merging problem is defined as the GCCL problem. In this paper, we present different algorithms for the GCCL problem. Each of the algorithms involves d stages of message passing, for a d-dimensional hypercube. The basic idea in these algorithms is that in each stage a processor increases its knowledge of the domain. The algorithms described in this paper differ in their run time, memory requirements, and message complexity. These algorithms have been implemented on an Intel iPSC2/D4/MX hypercube and the results are described in the paper.
parallel algorithms are presented for modules of learning automata with the objective of improving their speed of convergence without compromising accuracy. A general procedure suitable for parallelizing a large class...
详细信息
parallel algorithms are presented for modules of learning automata with the objective of improving their speed of convergence without compromising accuracy. A general procedure suitable for parallelizing a large class of sequential learning algorithms on a shared memory system is proposed. Results are derived to shea the quantitative improvements in speed obtainable using parallelization, The efficacy of the procedure is demonstrated by simulation studies on algorithms for common payoff games, parametrized learning automata and pattern classification problems with noisy classification of training samples.
We consider two problems pertaining to P-4-comparability graphs, namely, the problem of recognizing whether a simple undirected graph is a P-4-comparability graph and the problem of producing an acyclic P-4-transitive...
详细信息
We consider two problems pertaining to P-4-comparability graphs, namely, the problem of recognizing whether a simple undirected graph is a P-4-comparability graph and the problem of producing an acyclic P-4-transitive orientation of such a graph. Sequential algorithms for these problems have been presented by Hoang and Reed and very recently by Raschle and Simon, and by Nikolopoulos and Palios. In this paper, we establish properties of P-4-comparability graphs which allow us to describe parallel algorithms for the recognition and orientation problems on this class of graphs;for a graph on n vertices and in edges, our algorithms run in O(nm) time and require O(nm/log n) processors on the CREW PRAM model. Since the currently fastest sequential algorithms for these problems run in O(nm) time, our algorithms are cost-efficient;moreover, to the best of our knowledge, this is the first attempt to introduce parallelization in problems involving P-4-comparability graphs. Our approach relies on the parallel computation and proper orientation of the P-4-components of the input graph. (C) 2003 Elsevier Inc. All rights reserved.
This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed ...
详细信息
This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. Our connected components algorithm uses a novel approach for parallel merging which performs drastically limited updating during iterative steps, and concludes with a total consistency update at the final step. The algorithms have been coded in SPLIT-C and run on a variety of platforms. Our experimental results are consistent with the theoretical analysis and provide the best known execution times for these two primitives, even when compared with machine-specific implementations. (C) 1996 Academic Press, Inc.
The nearest neighbor search problem in general dimensions finds application in computational geometry, computational statistics, pattern recognition, and machine learning. Although there is a significant body of work ...
详细信息
The nearest neighbor search problem in general dimensions finds application in computational geometry, computational statistics, pattern recognition, and machine learning. Although there is a significant body of work on theory and algorithms, surprisingly little work has been done on algorithms for high-end computing platforms, and no open source library exists that can scale efficiently to thousands of cores. In this paper, we present algorithms and a library built on top of the message passing interface (MPI) and OpenMP that enable nearest neighbor searches to hundreds of thousands of cores for arbitrary-dimensional datasets. The library supports both exact and approximate nearest neighbor searches. The latter is based on iterative, randomized, and greedy KD-tree (k-dimensional tree) searches. We describe novel algorithms for the construction of the KD-tree, give complexity analysis, and provide experimental evidence for the scalability of the method. In our largest runs, we were able to perform an all-neighbors query search on a 13 TB synthetic dataset of 0.8 billion points in 2,048 dimensions on the 131K cores on Oak Ridge's XK6 "Jaguar" system. These results represent several orders of magnitude improvement over current state-of-the-art methods. Also, we apply our method to nonsynthetic data from machine learning data repositories. For example, we perform an all-nearest-neighbors search on a variant of the "MNIST" handwritten digit dataset with 8 million points in 784 dimensions on 16,384 cores of the "Stampede" system at the Texas Advanced Computing Center, achieving less than one second per RKDT iteration.
Corner stitching is the underlying data structure that is used to represent rectangular objects in interactive VLSI layout editing systems such as Magic and Tailor, In this paper we develop efficient algorithms for ba...
详细信息
Corner stitching is the underlying data structure that is used to represent rectangular objects in interactive VLSI layout editing systems such as Magic and Tailor, In this paper we develop efficient algorithms for basic corner stitching operations under the message-passing paradigm. These algorithms were implemented using C and PVM on a distributed network composed of SUN workstations, Experimental results show that significant speed-ups were obtained. (C) 1998 John Wiley & Sons, Ltd.
This paper surveys recent progress in the development of parallel algorithms for solving sparse linear systems on computer architectures having multiple processors. Attention is focused on direct methods for solving s...
详细信息
This paper surveys recent progress in the development of parallel algorithms for solving sparse linear systems on computer architectures having multiple processors. Attention is focused on direct methods for solving sparse symmetric positive definite systems, specifically by Cholesky factorization. Recent progress on parallel algorithms is surveyed for all phases of the solution process, including ordering, symbolic factorization, numeric factorization, and triangular solution.
We describe the first parallel algorithm with optimal speedup for constructing minimum-width tree decompositions of graphs of bounded treewidth. On n-vertex input graphs, the algorithm works in O((log n)(2)) time usin...
详细信息
We describe the first parallel algorithm with optimal speedup for constructing minimum-width tree decompositions of graphs of bounded treewidth. On n-vertex input graphs, the algorithm works in O((log n)(2)) time using O(n) operations on the EREW PRAM. We also give faster parallel algorithms with optimal speedup for the problem of deciding whether the treewidth of an input graph is bounded by a given constant and for a variety of problems on graphs of bounded treewidth, including all decision problems expressible in monadic second-order logic. On n-vertex input graphs, the algorithms use O(n) operations together with O(log n log*n) time on the EREW PRAM, or O(log n) time on the CRCW PRAM.
Relational Coarsest Partition Problems (RCPPs) play a vital role in verifying concurrent systems. It is known that RCPPs are P-complete and hence it may not be possible to design polylog time parallel algorithms for t...
详细信息
Relational Coarsest Partition Problems (RCPPs) play a vital role in verifying concurrent systems. It is known that RCPPs are P-complete and hence it may not be possible to design polylog time parallel algorithms for these problems. In this paper, we present two efficient parallel algorithms for RCPP in which its associated label transition system is assumed to have m transitions and n states. The first algorithm runs in O(n(1+epsilon)) time using m/n(epsilon) CREW PRAM processors, for any fixed epsilon < 1. This algorithm is analogous to and optimal with respect to the sequential algorithm of Kanellakis and Smolka. The second algorithm runs in O(n log n) time using m/n CREW PRAM processors. This algorithm is analogous to and nearly optimal with respect to the sequential algorithm of Paige and Tarjan.
暂无评论