The paper reports the design of a runtime library for data-parallel programming on clusters of symmetric multiprocessors (SMP clusters). Our design algorithms exploit a hybrid methodology which maps directly to the un...
详细信息
ISBN:
(纸本)0818686030
The paper reports the design of a runtime library for data-parallel programming on clusters of symmetric multiprocessors (SMP clusters). Our design algorithms exploit a hybrid methodology which maps directly to the underlying hierarchical memory system in SMP clusters, by combining two styles of programming methodologies-threads (shared memory programming) within a SMP node and message passing between SMP nodes. This hybrid approach has been used in the implementation of a library for collective communications. The prototype library is implemented based on standard interfaces for threads (pthread) and message passing (MPI). Experimental results on a cluster of Sun UltraSparc-II workstations are reported.
This article presents the P-RIO environment, which offers high-level but straightforward, concepts for parallel and distributed programming. A simple object-based software-construction methodology facilitates modulari...
详细信息
This article presents the P-RIO environment, which offers high-level but straightforward, concepts for parallel and distributed programming. A simple object-based software-construction methodology facilitates modularity and code reuse. This methodology promotes a clear separation of the individual sequential computation components from the interconnection structure used for their interaction. P-RIO provides immediate mapping of concepts associated with the software-construction methodology to their graphical representations. P-RIO offers a graphical programming tool, modular construction, high portability, and runtime support mechanisms for parallel programs, in architectures composed of heterogeneous computing nodes.
Applications are increasingly being executed on computational systems that have hierarchical parallelism. There are several programming paradigms which may be used to adapt a program for execution in such an environme...
详细信息
Applications are increasingly being executed on computational systems that have hierarchical parallelism. There are several programming paradigms which may be used to adapt a program for execution in such an environment. In this paper, we outline some of the challenges in porting codes to such systems, and describe a programming environment that we are creating to support the migration of sequential and MPI code to a cluster of shared memory parallel systems, where the target program may include MPI, OpenMP or both. As part of this effort, we are evaluating several experimental approaches to aiding in this complex application development task.
The Euclidean distance transform (EDT) is an important tool in image analysis. Previous work on computation of EDT is limited to sequential algorithms and parallel algorithms on general purpose architectures. The auth...
详细信息
The Euclidean distance transform (EDT) is an important tool in image analysis. Previous work on computation of EDT is limited to sequential algorithms and parallel algorithms on general purpose architectures. The authors develop a fast parallel algorithm that is amenable for VLSI implementation. The VLSI architecture is presented. Results of implementation of the VLSI design in a commercial package are also presented, and confirm the speed and suitability of the new method for real-time applications.
In this paper, we describe our experience with developing Airshed, a large pollution modeling application, in the Fx programming environment, We demonstrate that high level parallel programming languages like Fs and H...
详细信息
In this paper, we describe our experience with developing Airshed, a large pollution modeling application, in the Fx programming environment, We demonstrate that high level parallel programming languages like Fs and High Performance Fortran offer a simple and attractive model for developing portable and efficient parallel applications. Performance results are presented for the Airshed application executing on Intel Paragon and Clay T3D and T3E parallel computers. The results demonstrate that the application is "performance portable." i.e.. it achieves good and consistent performance across different architectures, and that the performance can be explained and predicted using a simple model for the communication and computation phases in the program. We also show how task parallelism was used to alleviate I O related bottlenecks. an important consideration in many applications. Finally, we demonstrate how external parallel modules developed using different parallelization methods can be integrated in a relatively simple and flexible way with modules developed in the Fx compiler framework. Overall, our experience demonstrates that a high level parallel programming environment based on a language like HPF is suitable for developing complex multidisciplinary applications. (C) 2000 Academic Press.
The paper introduces the concept of collective breakpoints and classifies the possible parallel breakpoints comparing their mechanisms. Based on the collective breakpoints the macrostep-bp-macrostep debugging mode has...
详细信息
The paper introduces the concept of collective breakpoints and classifies the possible parallel breakpoints comparing their mechanisms. Based on the collective breakpoints the macrostep-bp-macrostep debugging mode has been defined. After introducing the concept of the execution tree and meta-breakpoints a novel systematic debugging methodology of message passing parallel programs is explained. Finally, an algorithm is shown how to generate automatically the collective breakpoints in the GRADE graphical parallel programming environment. (C) 2000 Published by Elsevier Science B.V. All rights reserved.
This paper extends research into rhombic overlapping-connectivity interconnection networks into the area of parallel applications. As a foundation for a shared-memory non-uniform access bus-based multiprocessor, these...
详细信息
This paper extends research into rhombic overlapping-connectivity interconnection networks into the area of parallel applications. As a foundation for a shared-memory non-uniform access bus-based multiprocessor, these interconnection networks create overlapping groups of processors, buses, and memories, forming a clustered computer architecture where the clusters overlap. This overlapping-membership characteristic is shown to be useful for matching parallel application communication topology to the architecture's bandwidth characteristics. Many parallel applications can be mapped to the architecture topology so that most or all communication is localized within an overlapping cluster, at the low latency of processor direct to cache (or memory) over a bus. The latency of communication between parallel threads does not degrade parallel performance or limit the graininess of applications. parallel applications can execute with good speedup and scaling on a proposed architecture which is designed to obtain maximum advantage from the overlapping-cluster characteristic, and also allows dynamic workload migration without moving the instructions or data. Scalability limitations of bus-based shared-memory multiprocessors are overcome by judicious workload allocation schemes, that take advantage of the overlapping-cluster memberships. Bus-based rhombic shared-memory multiprocessors are examined in terms of parallel speedup models to explain their advantages and justify their use as a foundation for the proposed computer architecture. Interconnection bandwidth is maximized with bi-directional circular and segmented overlapping buses. Strategies for mapping parallel application communication topologies to rhombic architectures are developed. Analytical models of enhanced rhombic multiprocessor performance are developed with a unique bandwidth modeling technique, and are compared with the results of simulation.
The authors explain the testing results they achieved in developing an experimental software tool called CASCH. This system provides a unified environment for performing automatic parallelization and scheduling of app...
详细信息
The authors explain the testing results they achieved in developing an experimental software tool called CASCH. This system provides a unified environment for performing automatic parallelization and scheduling of applications without relying on simulations.
parallel execution is normally used to decrease the amount of time required to nun a program. However, the parallel execution may require far more space than that required by the sequential execution. Worse yet, the p...
详细信息
parallel execution is normally used to decrease the amount of time required to nun a program. However, the parallel execution may require far more space than that required by the sequential execution. Worse yet, the parallel space requirement may be very much more difficult to predict than the sequential space requirement because there are more factors to consider. These include essentially nondeterministic factors that can influence scheduling, which in turn may dramatically influence space requirements. We survey some scheduling algorithms that attempt to place bounds on the amount of time and space used during parallel execution. We also outline a direction for future research. This direction takes us into the area of functional programming, where the declarative nature of the languages can help the programmer to produce correct parallel programs, a feat that can be difficult with procedural languages. Currently the high-level nature of functional languages can make it difficult for the programmer to understand the operational behavior of the program. We look at some of the problems in this area, with the goal of achieving a programming environment that supports correct, efficient parallel programs. (C) 2000 Published by Elsevier Science B.V. All rights reserved.
in this approach, distributed services are developed by automatically generating lava implementations from designs expressed in SDL, a specification and description language. The authors discuss SDL's characterist...
详细信息
in this approach, distributed services are developed by automatically generating lava implementations from designs expressed in SDL, a specification and description language. The authors discuss SDL's characteristics, limitations, and expected future improvements in the next release.
暂无评论