With advances in processor and networking technologies, current distributed-memory machines can achieve hundreds of Giga Floating-Point Operations Per Second (GFLOPS) of performance. By using such machines, many appli...
详细信息
With advances in processor and networking technologies, current distributed-memory machines can achieve hundreds of Giga Floating-Point Operations Per Second (GFLOPS) of performance. By using such machines, many application problems having regularly structured computations have been successfully parallelized using the explicit message passing paradigm. However, it is difficult to parallelize vision problems having irregularly structured computations. parallel solutions to these problems are characterized by uneven distribution of symbolic features among the processors, unbalanced workload, and irregular interprocessor data dependency caused by the input image. It is therefore necessary to develop efficient algorithmic techniques to achieve large speed-ups. In this paper, we propose an algorithmic framework to design efficient and portable parallel algorithms for irregular vision problems on distributed-memory machines. Based on this algorithmic framework, we develop techniques for task scheduling, load balancing, and overlapping communication with computation.
Estimating communication cost involved in executing a program on distributed memory machines is important for evaluating the overheads due to repartitioning. We present a scheme which will work with reasonable efficie...
详细信息
ISBN:
(纸本)0818680679
Estimating communication cost involved in executing a program on distributed memory machines is important for evaluating the overheads due to repartitioning. We present a scheme which will work with reasonable efficiency for arrays with at most 3 dimensions. Hyperplane Partitioning technique given by [10] is extended to complete programs by estimating the communication cost by the scheme presented in this work.
Clusters of workstations are increasingly being viewed as a cost-effective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that t...
详细信息
Clusters of workstations are increasingly being viewed as a cost-effective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that the number of idle workstations available for executing parallel applications is constantly fluctuating. In this paper, we present a case for scheduling parallel applications on non-dedicated workstation clusters using dynamic space-sharing, a policy under which the number of processors allocated to an application can be changed during its execution. We describe an approach that uses application-level checkpointing and data repartitioning for supporting dynamic space-sharing and for handling the dynamic reconfiguration triggered when failure or owner activity is detected on a workstation being used by a parallel application. The performance advantages of dynamic space-sharing are quantified through a simulation study, and experimental results are presented for the overhead of dynamic reconfiguration of a grid-oriented data parallel application using our approach.
This paper describes two different parallelcomputing approaches for image processing problems on a Pentium based multiprocessor-system. These multiprocessor computers are often used as network servers. We demonstrate...
详细信息
ISBN:
(纸本)0819425885
This paper describes two different parallelcomputing approaches for image processing problems on a Pentium based multiprocessor-system. These multiprocessor computers are often used as network servers. We demonstrate the utilization of one of these machines, equipped with four Intel Pentium processors, far a parallel image processing task. A parallel computation of motion vector-fields based on correlation techniques is discussed to show the possible acceleration. The computational results show that a high efficiency can be reached, even a linear speedup is possible under certain conditions. Besides the mentioned correlation technique there are various image processing problems that can easily be evaluated in parallel. Although massively parallel systems and special purpose systems are much faster, off-line image processing can be accelerated by using these broadly available low-cost machines.
Monitoring tools are necessary components in the support of distributed applications and can be used to provide dependability, debugging and testing, to enhance the performance and to make possible the run-time steeri...
详细信息
ISBN:
(纸本)0818680679
Monitoring tools are necessary components in the support of distributed applications and can be used to provide dependability, debugging and testing, to enhance the performance and to make possible the run-time steering of applications. These tools are needed to exploit in the best way all the available high performance computing resources of a heterogeneous environment. This paper describes HOLMES, an on-line monitoring system designed to support dynamic management of resources that requires run-time measurement. HOLMES identifies the evolving system state and provides the necessary information to any dynamic policy to assign resources by following application evolution. HOLMES makes possible to control and steer an application even distributed across heterogeneous architectures, from parallel machines to clusters of workstations and PCs.
The Laplace transform in time has been shown to provide an excellent alternative to the finite difference method for the solution of parabolic problems associated with partial differential equations. An implementation...
详细信息
The Laplace transform in time has been shown to provide an excellent alternative to the finite difference method for the solution of parabolic problems associated with partial differential equations. An implementation of the Laplace transform method in a parallel environment can provide a concurrent solution process with no communication overhead. The Laplace transform in time, when applied to the diffusion problem, results in a modified Helmholtz equation in the transform space. The diffusion problem is solved in a parallel environment in which the elliptic problem in transform space is solved using finite differences, finite elements, boundary elements, the method of fundamental solutions and Kansa's multiquadric method.
Modern technology provides the infrastructure necessary to develop distributed applications capable of using the power of multiple supercomputing resources and exploiting their diversity. The performance potential off...
详细信息
This paper proposes that a parallel implementation of the genetic algorithm (GA) on the Internet will improve the algorithm's performance, It is motivated by the possibility of aiding research into complex search ...
详细信息
ISBN:
(纸本)0780341473
This paper proposes that a parallel implementation of the genetic algorithm (GA) on the Internet will improve the algorithm's performance, It is motivated by the possibility of aiding research into complex search and optimization problems that use the GA, Requirements and constraints regarding parallelization of the GA are identified, A parallel GA is developed for an ideal PRAM architecture and is shown to have an asymptotic running time of O(log n), an improvement over the sequential GA. A parallel GA is also designed for a Unix network and has an asymptotic running time comparable to the ideal system, The algorithm is a decentralized, asynchronous, and fault-tolerant design that matches characteristics of the network, The GA population is divided into colonies that are distributed among processors, Trade policies are executed for the exchange of genes.
An algorithm for the parallelcomputing of the boundary-element and finite-element combination method is presented in this paper. By introducing domain decomposition of an entire domain into the boundary-element and f...
详细信息
An algorithm for the parallelcomputing of the boundary-element and finite-element combination method is presented in this paper. By introducing domain decomposition of an entire domain into the boundary-element and finite-element subdomains, each analysis is performed independently and in parallel. Renewal iterative scheme for the parallelcomputing is the Schwarz method which was adopted to the domain decomposition parallel scheme in the boundary-element analysis. A cluster parallelcomputing system by workstations connected by LAN is constructed and employed aiming at efficient analysis. Convergence and accuracy of solutions on internal virtual boundaries are shown through some numerical examples.
Integer sorting is a subclass of the sorting problem where the elements have integer values and the largest element is polynomially bounded in the number of elements to be sorted. It is useful for applications in whic...
详细信息
ISBN:
(纸本)0818680679
Integer sorting is a subclass of the sorting problem where the elements have integer values and the largest element is polynomially bounded in the number of elements to be sorted. It is useful for applications in which the size of the maximum value of element to be sorted is bounded. In this paper, we present a new distributed radix-sort algorithm for integer sorting. The structure of our algorithm is similar to radix sort except that it typically requires less number of communication phases. We present experimental results for our algorithm on two distributed memory multiprocessors, the Intel Paragon and the Thinking machine CM-5. These results are compared with two other well known practical parallel sorting algorithms based on radix sort and sample sort. The experimental results show that the distributed radix-sort is competitive with the other two algorithms.
暂无评论