This paper presents the SCOOPP (SCalable Object Oriented parallel programming) approach to support the design and execution of scalable parallel applications. The SCOOPP programming model aims the portability, dynamic...
详细信息
This paper presents the SCOOPP (SCalable Object Oriented parallel programming) approach to support the design and execution of scalable parallel applications. The SCOOPP programming model aims the portability, dynamic scalability and efficiency of parallel applications. The SCOOPP is an hybrid compile and run-time system, which can perform parallelism extraction, supports explicit parallelism and performs dynamic granularity control at run-time. The mechanism that supports dynamic grain-size adaptation is presented and performance evaluated on two parallel systems. The measured results show the feasibility of the proposed dynamic grain-size adaptation and a scalability improvement of parallel applications over static parallel OO environments, which suggests cost benefits to develop scalable parallel applications to run on multiple platforms.
The ability to dynamically adapt an unstructured grid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however an efficient parallel implementation is rather difficult, ...
详细信息
The ability to dynamically adapt an unstructured grid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however an efficient parallel implementation is rather difficult, particularly from the viewpoint of portability on various multiprocessor platforms. We address this problem by developing PLUM, an automatic and architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.
This paper explores the transparent programmability of communicating parallel tasks in a Network of Workstations (NOW). Programs which are tied up with specific machines will not be resilient to the changing condition...
详细信息
This paper explores the transparent programmability of communicating parallel tasks in a Network of Workstations (NOW). Programs which are tied up with specific machines will not be resilient to the changing conditions of a NOW. The Distributed Pipes (DP) model enables location independent intertask communication among processes across machines. This approach enables migration of communicating parallel tasks according to runtime conditions. A transparent programming model for a parallel solution to Iterative Grid Computations using DP is also proposed. Programs written using the model are resilient to the heterogeneity of nodes and changing conditions in the NOW. They are also devoid of any network related code. The design of runtime support and function library support are presented. An engineering problem, namely, the Steady State Equilibrium Problem, is studied over the model. The performance analysis shows the speedup due to parallel execution and scaled down memory requirements. We present a case where the effect of communication overhead can be nullified to achieve a linear to super-linear speedup. The analysis discusses performance resilience of Iterative Grid Computations and characterizes synchronization delay among subtasks;and the effect of network overhead and load fluctuations on performance. The performance saturation characteristics of such applications are also studied.
THE POWER-EFFICIENT IMAGINE STREAM PROCESSOR ACHIEVES PERFORMANCE DENSITIES COMPARABLE TO THOSE OF SPECIAL-PURPOSE EMBEDDED PROCESSORS. EXECUTING PROGRAMS MAPPED TO STREAMS AND KERNELS, A SING LE IMAGINE PROCESSOR IS ...
详细信息
THE POWER-EFFICIENT IMAGINE STREAM PROCESSOR ACHIEVES PERFORMANCE DENSITIES COMPARABLE TO THOSE OF SPECIAL-PURPOSE EMBEDDED PROCESSORS. EXECUTING PROGRAMS MAPPED TO STREAMS AND KERNELS, A SING LE IMAGINE PROCESSOR IS EXPECTED TO HAVE A PEAK PERFORMANCE OF 20 GFLOPS AND SUSTAIN 18.3 GOPS ON MPEG-2 ENCODING.
A general philosophy is presented in which all the modules within the computational cycle are parallelised and executed on parallel computer hardware, thereby avoiding the creation of computational bottlenecks. In par...
详细信息
A general philosophy is presented in which all the modules within the computational cycle are parallelised and executed on parallel computer hardware, thereby avoiding the creation of computational bottlenecks. In particular, unstructured mesh generation with adaption, computational fluid dynamics and computational electromagnetic solvers and the visualisation of grid and solution data are all performed in parallel. In addition, all these modules are embedded within a parallel problem solving environment. This paper will provide an overview of these developments. In particular, details of the parallel mesh generator, which has been used to generate meshes in excess of 100 million elements, will be given. A brief overview will be presented of the approach used to parallelise the solvers and how large data sets are interrogated and visualised on distributed computer platforms. Details of the parallel adaption algorithm will be presented. These parallel component modules are linked using CORBA communications to provide an integrated parallel approach for large scale simulations. Several examples are given of the approach applied to the simulation of large aerospace calculations in the field of aerodynamics and electromagnetics.
The major contribution of this paper is the application of modern analysis techniques to the important Message Passing Interface standard, work done in order to obtain information useful in designing both application ...
详细信息
The major contribution of this paper is the application of modern analysis techniques to the important Message Passing Interface standard, work done in order to obtain information useful in designing both application programmer interfaces for object-oriented languages, and message passing systems. Recognition of 'Design Patterns' within MPI is an important discernment of this work. A further contribution is a comparative discussion of the design and evolution of three actual object-oriented designs for the Message Passing Interface (MPI-1) application programmer interface (API), two of which have influenced the standardization of C++ explicit parallel programming with MPI-2, and which strongly indicate the value of a priori object-oriented design and analysis of such APIs. Knowledge of design patterns is assumed herein. Discussion provided here includes systems developed at Mississippi State University (MPI++), the University of Notre Dame (OOMPI), and the merger of these systems that results in a standard binding within the MPI-2 standard. Commentary concerning additional opportunities for further object-oriented analysis and design of message passing systems and APIs, such as MPI-2 and MPI/RT, are mentioned in conclusion. Connection of modern software design and engineering principles to high performance computing programming approaches is a new and important further contribution of this work. Copyright (C) 2001 John Wiley & Sons, Ltd.
作者:
Bowen, JPHe, JFS Bank Univ
Ctr Appl Formal Methods Sch Comp Informat Syst & Math Borough Rd London SE1 0AA England UN Univ
Int Inst Software Technol Macau Macao Peoples R China
The use of Field Programmable Gate Arrays (FPGA) to produce custom hardware circuits rapidly using a completely software-based process is becoming increasingly widespread. Specialized Hardware Description Languages (H...
详细信息
The use of Field Programmable Gate Arrays (FPGA) to produce custom hardware circuits rapidly using a completely software-based process is becoming increasingly widespread. Specialized Hardware Description Languages (HDL) are used to describe and develop the required circuits. In this paper, we advocate using an even more general purpose programming language, based on Occam, for the automatic compilation of high-level programs to low-level circuits. The parallel constructs of Occam can map directly to hardware as conveniently as to software, with potentially dramatic speed-up of highly parallel algorithms. We demonstrate that the compilation process can be verified using algebraic refinement laws, increasing the confidence in its correctness. Verification is particularly important in high-integrity systems where safety or security is paramount. A prototype compiler has also been produced very directly from the theorems using the logic programming language Prolog.
We describe FATCOP 2.0, a new parallel mixed integer program solver:that works in an opportunistic computing environment provided by the Condor resource management system. We outline changes to the search strategy of ...
详细信息
We describe FATCOP 2.0, a new parallel mixed integer program solver:that works in an opportunistic computing environment provided by the Condor resource management system. We outline changes to the search strategy of FATCOP 1.0 that are necessary to improve resource utilization, together with new techniques to exploit heterogeneous resources. We detail several advanced features in the code that are necessary for successful solution of a variety of mixed integer test problems, along with the different usage schemes that are pertinent to our particular computing environment. Computational results demonstrating the effects of the changes are provided and used to generate effective default strategies for the FATCOP solver.
A functional scheme is described to parallelize computer simulations of grid-based ecological landscape models. The method is implemented using the Message Passing Interface protocol and is applied to the Everglades L...
详细信息
A functional scheme is described to parallelize computer simulations of grid-based ecological landscape models. The method is implemented using the Message Passing Interface protocol and is applied to the Everglades Landscape Vegetation Model. On a two-processor system, the speed-up is satisfactory and the overall performance of the program is competitive with traditional parallelization techniques such as geometrical decomposition. The method is discussed, timing information is provided for three different parallel machines, and some further developments are indicated. (C) 2001 Elsevier Science B.V. All rights reserved.
In human subjects, two mechanisms for improving the efficiency of saccades in visual search have recently been described: color priming and concurrent processing of two saccades. Since the monkey provides an important...
详细信息
In human subjects, two mechanisms for improving the efficiency of saccades in visual search have recently been described: color priming and concurrent processing of two saccades. Since the monkey provides an important model for understanding the neural underpinnings of target selection in visual search, we sought to explore the degree to which the saccadic system of monkeys uses these same mechanisms. Therefore, we recorded the eye movements of rhesus monkeys performing a simple color-oddity pop-out search task, similar to that used previously with human subjects. The monkeys were rewarded for making a saccade to the odd-colored target, which was presented with an array of three distracters. The target and distracters were randomly chosen to be red or green in each trial. Similar to what was previously observed for humans, we found that monkeys show the influence of a cumulative, short-term priming mechanism which facilitates saccades when the color of the search target happens to repeat from trial to trial. Furthermore, we found that like humans, when monkeys make an erroneous initial saccade to a distracter, they are capable of executing a second saccade to the target after a very brief inter-saccadic interval, suggesting that the two saccades have been programmed concurrently (i.e. in parallel). These results demonstrate a close similarity between human and monkey performance. We also made a new observation: we found that when monkeys make such two-saccade responses, the trajectory of the initial saccade tends to curve toward the goal of the subsequent saccade. This provides evidence that the two saccade goals are simultaneously represented on a common motor map, supporting the idea that the movements are processed concurrently. It also indicates that concurrent processing is not limited to brain areas involved in higher-level planning. rather, such parallel programming apparently occurs at a low enough level in the saccadic system that it can affect saccade traj
暂无评论