This article presents the C++ library vShark which reduces the intranode communication overhead of parallel programs on clusters of SMPs. The library is built on top of message-passing libraries like MPI to provide th...
详细信息
ISBN:
(纸本)3540297693
This article presents the C++ library vShark which reduces the intranode communication overhead of parallel programs on clusters of SMPs. The library is built on top of message-passing libraries like MPI to provide thread-safe communication but most importantly, to improve the communication between threads within one SMP node. vShark uses a modular but transparent design which makes it independent of specific communication libraries, Thus, different subsystems such as MPI, CORBA, or PVM could also be used for low-level communication. We present an implementation of vShark based on MPI and the POSIX thread library, and show that the efficient intra-node communication of vShark improves the performance of parallel algorithms.
We compare the performance of three major programmingmodels on a modern, 64-processor hardware cache-coherent machine, one of the two major types of platforms upon which high-performance computing is converging. We f...
详细信息
We compare the performance of three major programmingmodels on a modern, 64-processor hardware cache-coherent machine, one of the two major types of platforms upon which high-performance computing is converging. We focus on applications that are either regular, predictable or at least do not require fine-grained dynamic replication of irregularly accessed data. Within this class, we use programs with a range of important communication patterns. We examine whether the basic parallel algorithm and communication structuring approaches needed for best performance are similar or different among the models, whether some models have substantial performance advantages over others as problem size and number of processors change, what the sources of these performance differences are, where the programs spend their time, and whether substantial improvements can be obtained by modifying either the application programming interfaces or the implementations of the programmingmodels on this type of tightly-coupled multiprocessor platform.
Technological directions for innovative HPC software environments are discussed in this paper. We focus on industrial user requirements of heterogeneous multidisciplinary applications, performance portability, rapid p...
详细信息
Technological directions for innovative HPC software environments are discussed in this paper. We focus on industrial user requirements of heterogeneous multidisciplinary applications, performance portability, rapid prototyping and software reuse, integration and interoperability of standard tools. The Various issues are demonstrated with reference to the PQE2000 project and its programming environment Skeleton-based Integrated Environment (SkIE), SkIE includes a coordination language, SkIECL, allowing the designers to express, in a primitive and structured way, efficient combinations of data parallelism and task parallelism. The goal is achieving fast development and good efficiency for applications in different areas. Modules developed with standard languages and tools are encapsulated into SkIECL structures to form the global application. Performance models associated to the coordination language allow powerful optimizations to be introduced both at run time and at compile time without the direct intervention of the programmer. The paper also discusses the features of the SkIE environment related to debugging, performance analysis tools, visualization and graphical user interface. A discussion of the results achieved in some applications developed using the environment concludes the paper. (C) 1999 Elsevier Science B.V. All rights reserved.
Today, data-parallel programming models are the most successful programmingmodels for parallel computers both in terms of efficiency of execution and ease of use for the programmer. However, there is no parallel prog...
详细信息
Today, data-parallel programming models are the most successful programmingmodels for parallel computers both in terms of efficiency of execution and ease of use for the programmer. However, there is no parallelprogramming model that is conceptually simple and abstract, and that can be ported efficiently to the variety of parallel architectures available. The nested data-parallelprogramming model has some of the desired properties of a parallelprogramming model. In contrast to flat data parallelmodels, with this model it is possible to express irregular data structures and irregular parallel computations directly. In this paper, a collection-oriented approach to nested data parallelism is introduced. The state of the art of related research is presented and open questions are identified.
We survey parallel programming models and languages using six criteria to assess their suitability for realistic portable parallelprogramming. We argue that an ideal model should be easy to program, should have a sof...
详细信息
We survey parallel programming models and languages using six criteria to assess their suitability for realistic portable parallelprogramming. We argue that an ideal model should be easy to program, should have a software development methodology, should be architecture-independent, should be easy to understand, should guarantee performance, and should provide accurate information about the cost of programs. These criteria reflect our belief that developments in parallelism must be driven by a parallel software industry based on portability and efficiency. We consider programmingmodels in six categories, depending on the level of abstraction they provide. Those that are very abstract conceal even the presence of parallelism at the software level. Such models make software easy to build and port, but efficient and predictable performance is usually hard to achieve. At the other end of the spectrum, low-level models make all of the messy issues of parallelprogramming explicit (how many threads, how to place them, how to express communication, and how to schedule communication), so that software is hard to build and not very portable, but is usually efficient. Most recent models are near the center of this spectrum, exploring the best tradeoffs between expressiveness and performance. A few models have achieved both abstractness and efficiency. Both kinds of models raise the possibility of parallelism as part of the mainstream of computing.
Highly parallel machines needed to solve compute-intensive scientific applications are based on the distribution of physical memory across the compute nodes. The drawback of such systems is, the necessity to write app...
详细信息
Highly parallel machines needed to solve compute-intensive scientific applications are based on the distribution of physical memory across the compute nodes. The drawback of such systems is, the necessity to write applications in the message passing programming model. Therefore, a lot of research is going on in higher-level programmingmodels and supportive hardware, operating system techniques, languages. The research direction outlined in this article is based on shared virtual memory systems, i.e., scalable parallel systems with a global address space which support an adaptive mapping of global addresses to physical memories. We introduce programming concepts and program optimizations for SVM systems in the context of the SVM-Fortran programming environment which is based on a shared virtual memory system implemented on Intel Paragon. The performance results for real applications proved that this environment enables users to obtain a similar or better performance than by progamming in HPF. (C) 1998 Elsevier Science B.V. All rights reserved.
作者:
Merigot, AUNIV PARIS 11
CNRSURA22 INTEGRATED CIRCUITS & SYST ARCHITECTURE GRP FUNDAMENTAL ELECT INST ORSAY FRANCE
This paper presents a new parallel computing model called Associative Nets. This model relies on basic primitives called associations that consist to apply an associative operator over connected components of a subgra...
详细信息
This paper presents a new parallel computing model called Associative Nets. This model relies on basic primitives called associations that consist to apply an associative operator over connected components of a subgraph of the physical interprocessor connection graph. Associations can be very efficiently implemented (in terms of hardware cost or processing time) thanks to asynchronous computation. This model is quite effective for image analysis and several other fields;as an example, graph processing algorithms are presented. While relying on a much simpler architecture, these algorithms have, in general, a complexity equivalent to the one obtained by more expensive computing models, like the PRAM model.
Portability efficiency, and ease of coding are all important considerations in choosing the programming model for a scalable parallel application. The message-passing programming model is widely used because of its po...
详细信息
Portability efficiency, and ease of coding are all important considerations in choosing the programming model for a scalable parallel application. The message-passing programming model is widely used because of its portability, yet some applications are too complex to code in it while also trying to maintain a balanced computation load and avoid redundant computations. The shared-memory programming model simplifies coding, but it is not portable and often provides little control over interprocessor data transfer costs. This paper describes an approach, called Global Arrays (GAs), that combines the better features of both other models, leading to both simple coding and efficient execution. The key concept of GAs is that they provide a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes. We have implemented the GA library on a variety of computer systems, including the Intel Delta and Paragon, the IBM SP-1 and SP-2 (all message passers), the Kendall Square Research KSR-1/2 and the Convex SPP-1200 (nonuniform access shared-memory machines), the GRAY T3D (a globally addressable distributed-memory computer), and networks of UNIX workstations. We discuss the design and implementation of these libraries, report their performance, illustrate the use of GAs in the context of computational chemistry applications, and describe the use of a GA performance Visualization tool.
A Polymorphic Processor Array (PPA) is a two-dimensional mesh-connected array of processors, in which each processor is equipped with a switch able to interconnect its four NEWS ports. PPA is an abstract architecture ...
详细信息
A Polymorphic Processor Array (PPA) is a two-dimensional mesh-connected array of processors, in which each processor is equipped with a switch able to interconnect its four NEWS ports. PPA is an abstract architecture based upon the experience acquired in the design and in the implementation of a VLSI chip, namely the Polymorphic Torus (PT) chip, and, as a consequence, it only includes capabilities that have been proved to be supported by cost-effective hardware structures. The main claims of PPA are that 1) it models a realistic class of parallel computers, 2) it supports the definition of high level programmingmodels, 3) it supports virtual parallelism and 4) it supports low complexity algorithms in a number of application fields. In this paper we present both the PPA computation model and the PPA programming model;we show that the PPA computation model is realistic by relating it to the design of the PT chip and show that the PPA programming model is scalable by demonstrating that any algorithm having O(p) complexity on a virtual PPA of size square-root m x square-root m, has O(kp) complexity on a PPA of size square-root n x square-root n, with m = kn and k integer. We finally show some application algorithms in the area of numerical analysis and graph processing.
Despite rapid growth in workstation and networking technologies, the workstation environment continues to pose challenging problems to shared processing. In this paper, we present a computational model and system for ...
详细信息
Despite rapid growth in workstation and networking technologies, the workstation environment continues to pose challenging problems to shared processing. In this paper, we present a computational model and system for the generation of distributed applications in such an environment. The well-known RPC model is modified by a novel concept known as template attachment. A computation consists of a network of sequential procedures which have been encapsulated in templates. A small selection of templates is available from which a distributed application with the desired communication behavior can be rapidly built. The system generates all the required low-level code for correct synchronization, communication, and scheduling. This results in a system that is easy to use and flexible, and can provide a programmer with the desired amount of control in using idle processing power over a network of workstations. The practical feasibility of the model has been demonstrated by implementing it for Unix1-based workstation environments.
暂无评论