parallel applications with inconstant usage patterns presents a big challenge to programmers in that the spawning of tasks and the communication between them may be conditional (named "conditional parallel progra...
详细信息
parallel applications with inconstant usage patterns presents a big challenge to programmers in that the spawning of tasks and the communication between them may be conditional (named "conditional parallel programming"). Ideally, the programmer should not be burdened by operational issues which have little relationship to the application itself. This paper proposes a new parallel programming environment, ATME, to automate task scheduling in conditional parallel programming. By adaptively producing accurate estimates of the task model prior to execution, ATME modifies task distribution to improve the system and application performance.
This paper presents a programming tool for development of parallel applications on a networked environment, called GPS. The programming model adopted is based on message passing and process groups. The process group a...
详细信息
This paper presents a programming tool for development of parallel applications on a networked environment, called GPS. The programming model adopted is based on message passing and process groups. The process group approach is very attractive for model parallel applications since it is a very natural concept, closer to the application structure. The programming interface provided by GPS differs from other message-passing interfaces by its simplicity, generality and ease of use. The implementation of the tool, also covered in the paper, follows a new idea about distributed systems design: it is based on a microkernel environment.
The compilation of high-level programming languages for parallel machines faces two challenges: maximizing data/process locality and balancing load. No solutions for the general case are known that solve both problems...
详细信息
The compilation of high-level programming languages for parallel machines faces two challenges: maximizing data/process locality and balancing load. No solutions for the general case are known that solve both problems at once. The present paper describes a programming model that allows to solve both problems for the special case of neural network learning algorithms, even for irregular networks with dynamically changing topology (constructive neural algorithms). The model is based on the observation that such algorithms predominantly execute local operations (on nodes and connections of the network), reductions, and broadcasts. The model is concretized in an object-centered procedural language called CuPit. The language is completely abstract: No aspects of the parallel implementation such as number of processors, data distribution, process distribution, execution model etc. are visible in user programs. The compiler can derive most information relevant for the generation of efficient code from unannotated source code. Therefore, CuPit programs are efficiently portable. A compiler for CuPit has been built for the MasPar MP-1/MP-2 using compilation techniques that can also be applied to most other parallel machines. The paper shortly presents the main ideas of the techniques used and results obtained by the various optimizations.
The mpC language is an ANSI C superset supporting modular parallel programming for distributed memory machines. It allows the user to specify dynamically an application topology, and the mpC programming environment us...
详细信息
The mpC language is an ANSI C superset supporting modular parallel programming for distributed memory machines. It allows the user to specify dynamically an application topology, and the mpC programming environment uses this information in run time to provide the most efficient execution of the program on any particular distributed memory machine. The paper describes the features of mpC and its programming environment which allow to use them for developing libraries of parallel programs.
The complexity of characterizing both parallel hardware and software makes it very difficult to explain and predict the performances of parallel programs for real industrial CFD applications. A performance model based...
详细信息
The complexity of characterizing both parallel hardware and software makes it very difficult to explain and predict the performances of parallel programs for real industrial CFD applications. A performance model based on a generalized Amdahl's formulation has been developed and, applied to a flow solver. The present formulation allows us to explain the behavior of a typical CFD explicit multiblock solver when the program is run on a multiprocessor distributed-memory system. Using this approach, it is possible to gain an insight on the performance limitations of this class of parallel solvers, by considering the impact of larger and larger number of processors on fixed-scaled problems, (C) 1999 Academic Press, Inc.
This paper presents a multi-level frontal algorithm and its implementation and applications on parallel computation A multi-frontal program is given which may be used for unsymmetric finite element matrix equations. T...
详细信息
This paper presents a multi-level frontal algorithm and its implementation and applications on parallel computation A multi-frontal program is given which may be used for unsymmetric finite element matrix equations. The parallel program is developed on a cluster of workstations. The PVM (parallel virtual machine) system is used to handle communications among networked workstations. The method has advantages such as numbering of the finite element mesh in an arbitrary manner, simple programming organisation, smaller core requirements and computation times. An implementation of this parallel method on workstations is discussed, the speedup and efficiency of this method being demonstrated and compared with general domain decomposition method based on band matrix methods by numerical examples.
Software parallelization is required to contend with the increasing scale and complexity of High-Energy Physics experiments. The authors have developed a programming model, Communication Capability (CoCa), which allow...
详细信息
Software parallelization is required to contend with the increasing scale and complexity of High-Energy Physics experiments. The authors have developed a programming model, Communication Capability (CoCa), which allows this parallelization at several levels of granularity and reduces software complexity.
Based on the framework of BSP, a Hierarchical Bulk Synchronous parallel (HBSP) performance model is introduced in this paper to capture the per formance optimization problem for various stages in parallel program deve...
详细信息
Based on the framework of BSP, a Hierarchical Bulk Synchronous parallel (HBSP) performance model is introduced in this paper to capture the per formance optimization problem for various stages in parallel program development and to accurately predict the performance of a parallel program by considering fac tors causing variance at local computation and global communication. The related methodology has been applied to several real applications and the results show that HBSP is a suitable model for optimizing parallel programs.
Performance modeling for large industrial or scientific codes is of value for program tuning or for selection of new machines when benchmarking is not yet possible, We discuss an empirical method of estimating runtime...
详细信息
Performance modeling for large industrial or scientific codes is of value for program tuning or for selection of new machines when benchmarking is not yet possible, We discuss an empirical method of estimating runtime for certain large parallel programs where computational work is estimated by regression functions based on measurements and time cost of communication is modeled by program analysis and benchmarks for communication primitives. The method is demonstrated with the local weather model (LM) of the German Weather Service (DWD) on SP-2, T3E, and SX-4. The method is an economic way of developing performance models because only a moderate number of measurements is required. The resulting model is sufficiently accurate even for very large test cases. (C) 1999 Elsevier Science B.V. All rights reserved.
暂无评论