It is well known that parallel computers can be used very effectively for image processing at the pixel level, by assigning a processor to each pixel or block of pixels, and passing information as necessary between pr...
详细信息
It is well known that parallel computers can be used very effectively for image processing at the pixel level, by assigning a processor to each pixel or block of pixels, and passing information as necessary between processors whose blocks are adjacent. This paper discusses the use of parallel computers for processing images at the region level, assigning a processor to each region and passing information between processors whose regions are related. The basic difference between the pixel and region levels is that the regions (e.g. obtained by segmenting the given image) and relationships differ from image to image, and even for a given image, they do not remain fixed during processing. Thus, one cannot use the standard type of cellular parallelism, in which the set of processors and interprocessor connections remain fixed, for processing at the region level. Reconfigurable cellular computers, in which the set of processors that each processor can communicate with can change during a computation, are more appropriate. A class of such computers is described, and general examples are given illustrating how such a computer could initially configure itself to represent a given decomposition of an image into regions, and dynamically reconfigure itself, in parallel, as regions merge or split.
In this paper boundary value techniques for solving parabolic equations (PBV methods) will be proposed. The methods will be derived by the BVM methods for ODEs studied in [11,12], the parallel implementation of which ...
详细信息
In this paper boundary value techniques for solving parabolic equations (PBV methods) will be proposed. The methods will be derived by the BVM methods for ODEs studied in [11,12], the parallel implementation of which seems to be particularly efficient, especially for differential systems with a steady-state solution. The stability and the convergence properties of the proposed PBV methods will be studied. Numerical tests will be given both to illustrate the numerical features and the performance of the parallel version of some PBV method on a network of transputers. In particular we will consider the parallel implementation of the PBV method based on the Simpson rule which will be compared with its scalar version and with the scalar Hindmarsh's LSODE code.
The use of massively parallel computers provides an avenue to overcome the computational requirements of air quality modeling. General considerations on parallel implementation of air quality models are outlined inclu...
详细信息
The use of massively parallel computers provides an avenue to overcome the computational requirements of air quality modeling. General considerations on parallel implementation of air quality models are outlined including domain decomposition. The implementation of the CIT urban photochemical model on the Intel Touchstone Delta, a distributed memory multiple instruction/multiple data (MIMD) machine is described. When both the transport and chemistry portions of the model are parallelized, a speed-up of about 30 is achieved using 256 processors.
作者:
HEIDE, FMADJohann Wolfgang Goethe-Univ
Fachbereich Informatik Frankfurt am Main West Ger Johann Wolfgang Goethe-Univ Fachbereich Informatik Frankfurt am Main West Ger
A parallel computer (PC) with fixed communication network is called fair if the degree of this network is bounded, otherwise it is called unfair. In a PC with predictable communication each processor can precompute th...
详细信息
A parallel computer (PC) with fixed communication network is called fair if the degree of this network is bounded, otherwise it is called unfair. In a PC with predictable communication each processor can precompute the addresses of the processors it wants to communicate with in the next t steps in $O(t)$ steps. For an arbitrary $\varepsilon > 0$ we define fair PC’s M and $M'$ with $O(n^{1 + \varepsilon } )$ processors each. $M(M')$ can simulate each unfair PC with predictable communication and $O(\log (n))$ storage locations per processor (each fair PC) with n processors with constant time loss. $M'$ improves a result from [Acts Informatics, 19 (1983), pp. 269–296] where a time loss of $O(\log \log (n))$ was achieved. Assuming some reasonable properties of simulations we finally prove a lower bound $\Omega (\log (n))$ for the time loss of a fair PC which can simulate each unfair PC. Applying fast sorting or packet switching algorithms (Proc.15th Annual ACM Symposiums on Theory of Computing, Boston, 1983, pp. 1–9; 10–16; Proc. ACM Symposiums on Principles of Distributed Computing, Ottawa, 1982) one sees easily that this bound is asymptotically tight.
In this work we review the present status of numerical methods for partial differential equations on vector and parallel computers. A discussion of the relevant aspects of these computers and a brief review of their d...
详细信息
In this work we review the present status of numerical methods for partial differential equations on vector and parallel computers. A discussion of the relevant aspects of these computers and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial-boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. A brief discussion of application areas utilizing these computers is included.
A practical methodology for evaluating and comparing the performance of distributed memory Multiple Instruction Multiple Data (MIMD) systems is presented. The methodology determines machine parameters and program para...
详细信息
A practical methodology for evaluating and comparing the performance of distributed memory Multiple Instruction Multiple Data (MIMD) systems is presented. The methodology determines machine parameters and program parameters separately, and predicts the performance of a given workload oh the machines under consideration. Machine parameters are measured using benchmarks that consist of parallel algorithm structures. The methodology takes a workload-based approach in which a mix of application programs constitutes the workload. Performance of different systems are compared, under the given workload, using the ratio of their speeds. In order to validate the methodology, an example workload has been constructed and the time estimates have been compared with the actual runs, yielding good predicted values. Variations in the workload are analysed in terms of increase in problem sizes and changes in the frequency of particular algorithm groups. Utilization and scalability are used to compare the systems when the number of processors is increased. Tt has been shown that performance of parallel computers is sensitive to the changes in the workload and therefore any evaluation and comparison must consider a given user workload. Performance improvement that can be obtained by increasing the size of a distributed memory MIMD system depends on the characteristics of the workload as well as the parameters that characterize the communication speed of the parallel system.
In this paper, we consider the problem of selection on coarse-grained distributed memory parallel computers. We discuss several deterministic and randomized algorithms for parallel selection. We also consider several ...
详细信息
In this paper, we consider the problem of selection on coarse-grained distributed memory parallel computers. We discuss several deterministic and randomized algorithms for parallel selection. We also consider several algorithms for load balancing needed to keep a balanced distribution of data across processors during the execution of the selection algorithms. We have carried out detailed implementations of all the algorithms discussed on the CM-5 and report on the experimental results. The results clearly demonstrate the role of randomization in reducing communication overhead.
I/O in computer systems is prone to become a bottleneck. This is a particular severe problem in highly parallel machines where some applications are fully I/O bound if only one or few conventional I/O paths exist. Sim...
详细信息
I/O in computer systems is prone to become a bottleneck. This is a particular severe problem in highly parallel machines where some applications are fully I/O bound if only one or few conventional I/O paths exist. Similar to the use of multiprocessor technology for increasing processing performance, disk I/O performance can be substantially improved by employing parallel I/O schemes. Based on a distributed I/O architecture for parallel computers, we propose to use disk caches on several architectural levels, and confirm this by simulations of various structural options. In this paper, we describe the cache modelling approach and the I/O load model which has been derived From transaction-processing and general-purpose applications. Then we discuss the results for caches on single and multiple architecture levels. Large caches on I/O processors in combination with small caches on processing elements turn out to be the preferable structure. In addition, hardware caches can be employed at disk level for further performance improvement. For write operations, a delayed write strategy is shown to be superior to other modes.
As the limits of sequential processing are being approached, the use of parallel architectures in the implementation of modern scientific computer systems grows in importance. The use of advanced and powerful micropro...
详细信息
As the limits of sequential processing are being approached, the use of parallel architectures in the implementation of modern scientific computer systems grows in importance. The use of advanced and powerful microprocessors in these systems is a particularly effective approach. However, the costs associated with such systems take it critical to be able to forecast system behavior before an actual hardware prototype is constructed, in terms of such requirements as real-time performance, throughput, bandwidth, latency, reliability, availability, etc. Experience with the design and simulation of a new parallel computer is used as an example to illustrate a technique by which requirements can be accurately analyzed without the risk of premature hardware implementation. Using processor libraries, the complexities of the actual system are portrayed in the form of a software-based prototype system which provides significantly more accuracy than traditional simulation methods.
We present a software package for the simulation of very large neuronal networks on parallel computers. The package can be run on any system with an implementation of the Message Passing Interface standard. We also pr...
详细信息
We present a software package for the simulation of very large neuronal networks on parallel computers. The package can be run on any system with an implementation of the Message Passing Interface standard. We also present some example results for a simple neuronal model in networks of up to a quarter of a million neurons. The full software package as well as usage and installation guidelines can be found in [http://***/neurosys]. (C) 2000 Elsevier Science B.V. All rights reserved.
暂无评论