This paper considers a variety of geometric pattern recognition problems on input sets of size n using a coarse grained multicomputer model consisting of p processors with Omega(n/p) local memory each (i.e., Omega(n/p...
详细信息
This paper considers a variety of geometric pattern recognition problems on input sets of size n using a coarse grained multicomputer model consisting of p processors with Omega(n/p) local memory each (i.e., Omega(n/p) memory cells of Theta(log n) bits apiece), where the processors are connected to an arbitrary interconnection network. It introduces efficient scalable parallel algorithms for a number of geometric problems including the rectangle finding problem, the maximal equally spaced collinear points problem, and the point set pattern matching problem. All of the algorithms presented are scalable in that they are applicable and efficient over a very wide range of ratios of problem size to number of processors. In addition to the practicality imparted by scalability, these algorithms are easy to implement in that all required communications can be achieved by a small number of calls to standard global routing operations. (C) 1999 Academic Press.
Global optimization is playing an increasing role in physics, chemistry, and biophysical chemistry. One of the most important applications of global optimization is to find the global minima of the potential energy of...
详细信息
Global optimization is playing an increasing role in physics, chemistry, and biophysical chemistry. One of the most important applications of global optimization is to find the global minima of the potential energy of molecules or molecular assemblies, such as crystals. The solution of this problem typically requires huge computational effort. Even the fastest processor available is not fast enough to carry out this kind of computation in real time for the problems of real interest, e.g., protein and crystal structure prediction. One way to circumvent this problem is to take advantage of massively parallel computing. In this paper, we provide several examples of parallel implementations of global optimization algorithms developed in our laboratory. All of these examples follow the master/worker approach. Most of the methods are parallelized on the algorithmic (coarse-grain) level and one example of fine-grain parallelism is given, in which the function evaluation itself is computationally expensive. All parallel algorithms were initially implemented on an IBM/SP2 (distributed-memory) machine. In all cases, however, message passing is handled through the standard Message Passing Interface (MPI);consequently the algorithms can also be implemented on any distributed- or shared-memory system that runs MPI. The efficiency of these implementations is discussed. (C) 2000 Elsevier Science B.V. All rights reserved.
A constant-time algorithm for labeling the connected components of an N x N image on a reconfigurable network of N3 processors is Presented. The main contribution of the algorithm is a novel constant-time technique fo...
详细信息
A constant-time algorithm for labeling the connected components of an N x N image on a reconfigurable network of N3 processors is Presented. The main contribution of the algorithm is a novel constant-time technique for determining the minimum-labeled PE in each component. The number of processors used by the algorithm can be reduced to N2+(1/d), for any 1 less-than-or-equal-to d less-than-or-equal-to log N, if O(d) time is allowed.
作者:
Kapralski, AUniversity of Aizu
Computer Software Department Mathematical Foundations of Computer Science Laboratory Aizu-wakamatsu 965-80 Japan JP
The problem of finding the shortest path between the given terminal points s and t within a given planar figure F is considered. The approach contains basic methodology developed for any parallel or distributed system...
详细信息
The problem of finding the shortest path between the given terminal points s and t within a given planar figure F is considered. The approach contains basic methodology developed for any parallel or distributed system. The 2D scene or the edge of F are represented in the n Cartesian coordinate system (n-CCS). Several algorithms for the shortest path are given, each one to be applied in specified circumstances depending on the exact machine model or on additional information concerning geometrical properties of the figure. If these algorithms are implemented in a parallel depth search machine (PDSM), then the shortest path can be computed in time O(1). The maximum number of processors used is 0(n). The given methodology can also be adapted for producing an approximate solution when the shortest path is approximated by polygonal lines.
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and cos...
详细信息
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributed-memory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.
We present the first parallel algorithms that decide strong and branching bisimilarity in linear time. More precisely, if a transition system has n states, m transitions and |Act| action labels, we introduce an algori...
详细信息
We present the first parallel algorithms that decide strong and branching bisimilarity in linear time. More precisely, if a transition system has n states, m transitions and |Act| action labels, we introduce an algorithm that decides strong bisimilarity in O(n + |Act|) time on max(n, m) processors and an algorithm that decides branching bisimilarity in O(n + |Act|) time using up to max(n(2), m, |Act|n) processors.
Several recent papers have introduced asynchronous shared memory parallel models in an attempt to discover how removing the assumption of synchronization of processor steps may alter the parallel complexities of probl...
详细信息
Several recent papers have introduced asynchronous shared memory parallel models in an attempt to discover how removing the assumption of synchronization of processor steps may alter the parallel complexities of problems. Preliminary work has resulted in the development and analysis of algorithms for a few specific problems. The best known general technique for transforming synchronous algorithms into asynchronous ones has been to synchronize all processors after each step of a synchronous computation. This results in the time complexity being multiplied by a factor that may be logarithmic in the number of processors, where time is defined to be the expected maximum numbers of steps taken by any processor, with respect to several families of distributions on processor schedules. Here we give a transformation technique that forms an asynchronous algorithm that runs in O(t + log p) time using p processors from any synchronous algorithm that runs in O(t) time on a p-processor CROW PRAM. This result can be extended to more general classes of synchronous algorithms. The technique yields an O(log p) time algorithm for p-input sorting, making possible the simulation of algorithms that run on the PRIORITY CRCW PRAM. (C) 1995 Academic Press, Inc.
There is a growing interest in designing parallel algorithms on restricted classes of graphs. Some optimal parallel algorithms are presented on circular-arc graphs - specifically, optimal parallel algorithms for find...
详细信息
There is a growing interest in designing parallel algorithms on restricted classes of graphs. Some optimal parallel algorithms are presented on circular-arc graphs - specifically, optimal parallel algorithms for finding (unweighted) maximum independent set, minimum clique cover, and minimum dominating set in a unified way. Given the sorted array of endpoints of the arcs in the intersection model of the circular-arc graph, all these problems can be solved sequentially in O(n) time where n is the number of arcs. All the parallel algorithms run in O(log n) time on an (n/log n)-processor EREW PRAM (parallel random access memory). Maximum cliques, maximum independent sets, and minimum dominating sets are finding applications in project scheduling, cluster analysis, parallel processing, and facilities location.
EEG-classification is necessary, for instance, to test the hearing of babies and to adapt hearing aids. A CPU-time of 3 hours is necessary for these classifications on serial computers like CYBER 845. This is too long...
详细信息
EEG-classification is necessary, for instance, to test the hearing of babies and to adapt hearing aids. A CPU-time of 3 hours is necessary for these classifications on serial computers like CYBER 845. This is too long to do examinations systematically. We have reduced the CPU-time for these classifications on the DAP. The result is that the DAP needs only 6 minutes CPU-time for the classifications with 8 attributes instead of 4 attributes used in the serial case.
A problem in pattern recognition is to find the maximum sum over ail rectangular subregions of a given (n X n) matrix of real numbers. The problem has one-dimensional (1D) and two-dimensional (2D) versions. For the 1D...
详细信息
A problem in pattern recognition is to find the maximum sum over ail rectangular subregions of a given (n X n) matrix of real numbers. The problem has one-dimensional (1D) and two-dimensional (2D) versions. For the 1D version, it is to find the maximum sum over all contiguous subvectors of a given vector of n real numbers. We give an algorithm for the 1D version running in O(log n) time using O((n)/(log n)) processors on the EREW PRAM, and an algorithm for the 2D version which takes O(log n) time using O((n(3))/(log n)) processors on the EREW PRAM.
暂无评论