parallel Genetic Algorithms are suited to deal with problems with very large solution spaces and they can support efficient parallel distribution of work. In a PGA Island Model the migration strategy can take advantag...
详细信息
parallel Genetic Algorithms are suited to deal with problems with very large solution spaces and they can support efficient parallel distribution of work. In a PGA Island Model the migration strategy can take advantage of high latency communication channels in a distributed system. This approach suggests the use of networked workstation environments as a cost effective alternative to MPP systems. A Genetic Algorithm Programming System (GAPS) was developed to evaluate the proposed approach, which supports the design of parallel genetic programs and its execution in a distributed workstation environment. GAPS separates the specification of the problem and the user application interface, from the implementation and management details of the run-time environment;it also addresses fault tolerance, needed to recover from a fault that may occur in a dynamic network of heterogeneous workstations. GAPS uses PVM to implement a structural load balance strategy, which distributes complex evaluation functions with large chromosomes across a parallel machine. The proposed system showed to be effective when tested with the knapsack problem.
R-tree is a very popular dynamic access structure cable of storing multidimensional and spatial data. Considering it's merit of the efficient global balance and dynamic reorganization. We try to use R-tree to decl...
详细信息
ISBN:
(纸本)0818678763
R-tree is a very popular dynamic access structure cable of storing multidimensional and spatial data. Considering it's merit of the efficient global balance and dynamic reorganization. We try to use R-tree to decluster the multiattribute data in database system or file system. As Many previous multiattribute declustering mechanisms do not take into account the properties of the Cluster of Workstations (COW), we present the Global parallel R-Tree(GPR-Tree) under the architecture of COW. Firstly we inspect the issues in efficiency of R-tree and it's variants, we try to enhance the R-Tree efficiency by using heuristics information in the reconstruction of R-Tree during the node splitting and the treatment of the orphan entries of the underfilled node. Then we parallelize the improved R-Tree among the components in the system. The basic thought is to alleviate the bottleneck effect of the I/O subsystem, making use of the high speed network communication and the memory. The GPR-Tree is shared among the processing units (PU) of the system. We use a mixed LRU algorithm to schedule pages in memory to maintain the nodes visited frequently in memory. A write-update-like protocol is used to keep the coherency among multiple copies maintained in the system. This mechanism will be proved efficient to improve the salability and performance of the system.
The Internet, best known by most users as the World-Wide-Web, continues to expand at an amazing pace. We propose a new infrastructure to harness the combined resources, such as CPU cycles or disk storage, and make the...
详细信息
The Internet, best known by most users as the World-Wide-Web, continues to expand at an amazing pace. We propose a new infrastructure to harness the combined resources, such as CPU cycles or disk storage, and make them available to everyone interested. This infrastructure has the potential for solving parallel supercomputing applications involving thousands of cooperating components. Our approach is based on recent advances in Internet connectivity and the implementation of safe distributedcomputing embodied in languages such as Java. We developed a prototype of a global computing infrastructure, called SuperWeb, that consists of hosts, brokers and clients. Hosts register a fraction of their computing resources (CPU time, memory, bandwidth, disk space) with resource brokers. Client computations are then mapped by the broker onto the registered resources. We examine an economic model for trading computing resources, and discuss several technical challenges associated with such a global computing environment.
Reduction operations are very useful in parallel and distributedcomputing, with applications in barrier synchronization, distributed snapshots, termination detection, global virtual time computation, etc. In the cont...
详细信息
Reduction operations are very useful in parallel and distributedcomputing, with applications in barrier synchronization, distributed snapshots, termination detection, global virtual time computation, etc. In the context of parallel discrete-event simulations, we have previously introduced a class of adaptive synchronization algorithms based on fast reductions. Here, we explore the implementation of fast reductions on a popular high performance computing platform - a network of workstations. The specific platform is a set of Pentium Pro PC's running the Linux operating system, interconnected by Myrinet - a Gbps network. The general reduction model on which our synchronization algorithms are based is introduced first, followed by a description of how this model can be implemented. We discuss several design trade-offs that must be made in order to achieve the driving goal of high speed reductions and provide innovative algorithms to meet the correctness and performance requirements of the reduction model.
Armstrong III is a multi node multi-computer designed and built at the Laboratory for Engineering Man/Machine System (LEMS) of Brown University. Each node contains a RISC processor and reconfigurable resources impleme...
详细信息
Armstrong III is a multi node multi-computer designed and built at the Laboratory for Engineering Man/Machine System (LEMS) of Brown University. Each node contains a RISC processor and reconfigurable resources implemented with FPGAs. The primary benefit in using FPGAs is that the resulting hardware is neither rigid nor permanent but is in-circuit reprogrammable. This allows each node to be tailored to the computational requirements of an application. This paper describes the Armstrong III architecture and concludes with a substantive example application that performs HMM Training for speech recognition with the reconfigurable platform.
This paper presents a parallelcomputing model, called H-BSP, which adds a hierarchical concept to the BSP (Bulk Synchronous parallel) computing model. A H-BSP program consists of a number of BSP groups which are dyna...
详细信息
This paper presents a parallelcomputing model, called H-BSP, which adds a hierarchical concept to the BSP (Bulk Synchronous parallel) computing model. A H-BSP program consists of a number of BSP groups which are dynamically created at run time and executed in a hierarchical fashion. H-BSP allows algorithm designer to develop more efficient algorithm by utilizing processor locality in the program. This paper describes the structure of H-BSP model, complexity analysis and an example of H-BSP algorithm. Also presented is a performance characteristics of H-BSP algorithm based on the simulation analysis. Simulation results show that H-BSP model takes advantages of processor locality and performs well in low bandwidth networks or in a constant-valence architecture such as 2-dimensional mesh. It is also proved that H-BSP model can predict algorithm performance better than BSP model due to its locality preserving nature.
With advances in processor and networking technologies, current distributed-memory machines can achieve hundreds of Giga Floating-Point Operations Per Second (GFLOPS) of performance. By using such machines, many appli...
详细信息
With advances in processor and networking technologies, current distributed-memory machines can achieve hundreds of Giga Floating-Point Operations Per Second (GFLOPS) of performance. By using such machines, many application problems having regularly structured computations have been successfully parallelized using the explicit message passing paradigm. However, it is difficult to parallelize vision problems having irregularly structured computations. parallel solutions to these problems are characterized by uneven distribution of symbolic features among the processors, unbalanced workload, and irregular interprocessor data dependency caused by the input image. It is therefore necessary to develop efficient algorithmic techniques to achieve large speed-ups. In this paper, we propose an algorithmic framework to design efficient and portable parallel algorithms for irregular vision problems on distributed-memory machines. Based on this algorithmic framework, we develop techniques for task scheduling, load balancing, and overlapping communication with computation.
Estimating communication cost involved in executing a program on distributed memory machines is important for evaluating the overheads due to repartitioning. We present a scheme which will work with reasonable efficie...
详细信息
ISBN:
(纸本)0818680679
Estimating communication cost involved in executing a program on distributed memory machines is important for evaluating the overheads due to repartitioning. We present a scheme which will work with reasonable efficiency for arrays with at most 3 dimensions. Hyperplane Partitioning technique given by [10] is extended to complete programs by estimating the communication cost by the scheme presented in this work.
Clusters of workstations are increasingly being viewed as a cost-effective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that t...
详细信息
Clusters of workstations are increasingly being viewed as a cost-effective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that the number of idle workstations available for executing parallel applications is constantly fluctuating. In this paper, we present a case for scheduling parallel applications on non-dedicated workstation clusters using dynamic space-sharing, a policy under which the number of processors allocated to an application can be changed during its execution. We describe an approach that uses application-level checkpointing and data repartitioning for supporting dynamic space-sharing and for handling the dynamic reconfiguration triggered when failure or owner activity is detected on a workstation being used by a parallel application. The performance advantages of dynamic space-sharing are quantified through a simulation study, and experimental results are presented for the overhead of dynamic reconfiguration of a grid-oriented data parallel application using our approach.
This paper describes two different parallelcomputing approaches for image processing problems on a Pentium based multiprocessor-system. These multiprocessor computers are often used as network servers. We demonstrate...
详细信息
ISBN:
(纸本)0819425885
This paper describes two different parallelcomputing approaches for image processing problems on a Pentium based multiprocessor-system. These multiprocessor computers are often used as network servers. We demonstrate the utilization of one of these machines, equipped with four Intel Pentium processors, far a parallel image processing task. A parallel computation of motion vector-fields based on correlation techniques is discussed to show the possible acceleration. The computational results show that a high efficiency can be reached, even a linear speedup is possible under certain conditions. Besides the mentioned correlation technique there are various image processing problems that can easily be evaluated in parallel. Although massively parallel systems and special purpose systems are much faster, off-line image processing can be accelerated by using these broadly available low-cost machines.
暂无评论