Co-array Fortran (Numrich and Reid 1998), abbreviated to CAF, is an extension of Fortran 9095 for parallel programming that has been designed to be easy both for the compiler writer to implement and for the programmer...
详细信息
ISBN:
(纸本)354043786X
Co-array Fortran (Numrich and Reid 1998), abbreviated to CAF, is an extension of Fortran 9095 for parallel programming that has been designed to be easy both for the compiler writer to implement and for the programmer to write and understand. It offers the prospect of clear and efficient parallel programming on homogeneous parallel systems. Each processor has an identical copy of the program and has its own data objects. Each co-array is evenly spread over all the processors with each processor having a part of exactly the same shape. The language is carefully designed so that implementations will usually use the same address on each processor for the processor's part of the co-array. Subscripts in round brackets are used in the usual way to address the local part and subscripts in square brackets are used to address parts on other processors. References without square brackets are to local data, so code that can run independently is uncluttered. Only where there is are square brackets, or a procedure call to code that involves square brackets, is there communication between processors. There are intrinsic procedures to synchronize processors, return the number of processors, and return the index of the current processor. A subset of Co-Array Fortran is available on the T3E and the aim of this talk is to explain how it can be used effectively for computations on both full and sparse matrices. In particular, we show how the LINPACK benchmark can be written in this language and compare its performance with that of ScaLAPACK (Blackford et al. 1997), paying particular attention to the solution of a single set of equations. For sparse systems, we consider the use of CAF for frontal and multifrontal methods.
This work presents the design of the Coven framework for construction of Problem Solving Environments (PSEs) for parallel computers. PSEs are an integral part of modern high performance computing (HPC) and Coven attem...
详细信息
ISBN:
(纸本)0769516866
This work presents the design of the Coven framework for construction of Problem Solving Environments (PSEs) for parallel computers. PSEs are an integral part of modern high performance computing (HPC) and Coven attempts to simplify PSE construction. Coven targets Beowulf cluster parallel computers but independent of any particular domain for the PSE. Multi-threaded parallel applications are created with Coven that are capable of supporting most of the constructs in a typical parallel programming language. Coven uses an agent-based front-end which allows multiple custom interfaces to be constructed Examples of the use of Coven in the construction of prototype PSEs are shown, and the effectiveness of these PSEs is evaluated in terms of the performance of the applications they generate.
Stampede is a parallel programming system to support computationally demanding applications including interactive vision, speech and multimedia collaboration. The system alleviates concerns such as communication, sync...
详细信息
ISBN:
(纸本)0769516777
Stampede is a parallel programming system to support computationally demanding applications including interactive vision, speech and multimedia collaboration. The system alleviates concerns such as communication, synchronization, and buffer management in programming such real-time stream-oriented applications. Threads are loosely connected by channels that hold timestamped data items. There are two performance concerns when programming with Stampede. The first is space, namely, ensuring that memory is not wasted on items that are not fully processed. The second is time, namely, ensuring that processing resource is not wasted on a timestamp that is not fully processed. It? this paper we introduce a single unifying framework, dead timestamp identification, that addresses both the space and time concerns simultaneously. Dead timestamps on a channel represent garbage. Dead timestamps at a thread represent computations that need not be performed. This framework has been implemented in the Stampede System. Experimental results showing the space advantage of this framework are presented, Using a color-based people tracker application, we show that the space advantage can be significant (up to 40%) compared to the previous garbage collection techniques in Stampede.
Data parallel compilers have long aimed to equal the performance of carefully hand-optimized parallel codes. For tightly-coupled applications based on line sweeps, this goal has been particularly elusive. In the Rice ...
详细信息
ISBN:
(纸本)0769516203
Data parallel compilers have long aimed to equal the performance of carefully hand-optimized parallel codes. For tightly-coupled applications based on line sweeps, this goal has been particularly elusive. In the Rice dHPF compiler, we have developed a wide spectrum of optimizations that enable us to closely approach hand-coded performance for tightly-coupled line sweep applications including the NAS SP and BT benchmark codes. From lightly-modified copies of standard serial versions of these benchmarks, dHPF generates MPI-based parallel code that is within 4% of the performance of the hand-crafted MPI implementations of these codes for a 102(3) problem size (Class 13) on 64 processors. We describe and quantitatively evaluate the impact of partitioning, communication and memory hierarchy optimizations implemented by dHPF that enable us to approach hand-coded performance with compiler-generated parallel code.
Biomedical data analysis typically involves large data sets, a diverse user base and intensive visualization procedures. These features place stringent quality demands upon software development and performance for bot...
详细信息
ISBN:
(纸本)0780375149
Biomedical data analysis typically involves large data sets, a diverse user base and intensive visualization procedures. These features place stringent quality demands upon software development and performance for both the architect and supervisor. An invaluable tool for software supervisors would be the automatic qualitative grading of software objects in terms of their extensibility, reusability, clarity and efficiency. This paper examines the quality assessment of software objects by a multilayer perceptron in relation to a gold standard provided by two independent software architects.
Today, parallel programming is dominated by message passing libraries such as MPI. Algorithmic skeletons intend to simplify parallel programming by increasing the expressive power. The idea is to offer typical paralle...
详细信息
ISBN:
(纸本)3540440496
Today, parallel programming is dominated by message passing libraries such as MPI. Algorithmic skeletons intend to simplify parallel programming by increasing the expressive power. The idea is to offer typical parallel programming patterns as polymorphic higher-order functions which are efficiently implemented in parallel. The approach presented here integrates the main features of existing skeleton systems. Moreover, it does not come along with a new programming language or language extension, which parallel programmers may hesitate to learn, but it is offered in form of a library, which can easily be used by e.g. C and C++ programmers. A major technical difficulty is to simulate the main requirements for a skeleton implementation, namely higher-order functions, partial applications, and polymorphism as efficiently as possible in an imperative programming language. Experimental results based on a draft implementation of the suggested skeleton library show that this can be achieved without a significant performance penalty.
In the past years computing has been moving from the sequential world to the parallel one, from centralised organisation to a decentralised. In parallel programming the goal of the design process cannot be reduced to ...
详细信息
ISBN:
(纸本)3540437924
In the past years computing has been moving from the sequential world to the parallel one, from centralised organisation to a decentralised. In parallel programming the goal of the design process cannot be reduced to optimise a single metrics like for example speed. While evaluating a parallel program a problem specific function of execution time, memory requirements, communication cost, implementation cost, and others have to be taken into consideration. The paper deals with the use of an idea of program granularity in the evaluation of parallel programs. The obtained results suggest that the presented method can be used for performance evaluation of parallel programs.
The paper considers the creation of intelligent solving machines and the arrangement of parallel programming in intelligent distributed multiprocessor systems based on those. There are proposed some main concepts. A s...
详细信息
ISBN:
(纸本)0769517315
The paper considers the creation of intelligent solving machines and the arrangement of parallel programming in intelligent distributed multiprocessor systems based on those. There are proposed some main concepts. A system is designed for programming in the C+Graph high-level language. C+Graph provides an efficient operation with knowledge (complicated data structures) and centralized-decentralized control exercised in virtually distributed computation space. parallel C+Graph programming model is based on a model used for multiple-flow monoprocessor programming of the basic Java language. The model operates in a virtual C+Graph machine network. The ideology proposed can be considered as an efficient development of structural high-level language interpretation when applied to multi-microprocessor systems. Equipment structure of the basic version of intelligent solving machines is considered and some characteristics are discussed.
This work mainly aims to analyze the seismic acquisition quality while preparing adequate information for optimization of the subsequent Preserved Amplitude Prestack Depth Migration (PAPsDM). The main advantages are t...
详细信息
parallel programming is essential for large-scale scientific simulations, and MPI is intended to be a de facto standard API for this kind of programming. Since MPI has several functions that exhibit similar behaviors,...
详细信息
暂无评论