We present the architecture of a parallel programming system, the di_pSystem. Our target machine consists of clusters of multiprocessors interconnected with very fast networks. The system aims to provide a programming...
详细信息
In order to increase the overall performance of distributed parallel programs running in a network of non-dedicated workstations, we have researched methods for improving load balancing in loosely coupled heterogeneou...
详细信息
In order to increase the overall performance of distributed parallel programs running in a network of non-dedicated workstations, we have researched methods for improving load balancing in loosely coupled heterogeneous distributed systems. Current software designed to handle distributed applications does not focus on the problem of forecasting the computers future load. The software only dispatches the tasks assigning them either to an idle CPU (in dedicated networks) or to the lowest loaded one (in non-dedicated networks). Our approach tries to improve the standard dispatching strategies provided by both parallel languages and libraries, by implementing new dispatching criteria. It will choose the most suitable computer after forecasting the load of the individual machines based on current and historical data. Existing applications could take advantage of this new service with no extra changes but a recompilation. A fair comparison between different dispatching algorithms could only be done if they run over the same external network load conditions. In order to do so, a tool to arbitrarily replicate historical observations of load parameters while running the different strategies was developed. In this environment, the new algorithms are being tested and compared to verify the improvement over the dispatching strategy already available. The overall performance of the system was tested with in-house developed numerical models. The project reported here is connected with other efforts at CeCal devoted to make it easier for scientists and developers to gain advantage of parallel computing techniques using low cost components.
Traditionally, the Local Area Network (LAN) hasb een used for parallel programming with PVM and MPI. The improvement of communicationsin WirelessLo cal Area Network (WLAN) achieving till 11 Mbps make them, according t...
详细信息
Synthetic aperture radar (SAR) imaging processing is a complex operation that requires a large amount of floating point computations. It is very time-consuming to convert SAR raw data into imagery on microcomputers. T...
详细信息
Synthetic aperture radar (SAR) imaging processing is a complex operation that requires a large amount of floating point computations. It is very time-consuming to convert SAR raw data into imagery on microcomputers. The authors have successively developed SAR processors using two kinds of computers-JN3AR of distributed memory multiprocessor architecture (made in China), and SGI Origin 200 of scalable shared memory multiprocessor architecture. The SAR imaging efficiency has been greatly improved.
The long foreseen goal of parallel programming models is to scale parallel code without significant programming effort. Irregular parallel applications are a particularly challenging application domain for parallel pr...
详细信息
The long foreseen goal of parallel programming models is to scale parallel code without significant programming effort. Irregular parallel applications are a particularly challenging application domain for parallel programming models, since they require domain specific data distribution and load balancing algorithms. From a performance perspective, shared-memory models still fall short of scaling as well as message-passing models in irregular applications, although they require less coding effort. We present a simple runtime methodology for scaling irregular applications parallelized with the standard OpenMP interface. We claim that our parallelization methodology requires the minimum amount of effort from the programmer and prove experimentally that it is able to scale two highly irregular codes as well as MPI, with an order of magnitude less programming effort. This is probably the first time such a result is obtained from OpenMP, more so, by keeping the OpenMP API intact.
In this paper we use the tensor product notation as the framework of a programming methodology for designing various parallel prefix algorithms. In this methodology, we first express a computational problem in its mat...
详细信息
ISBN:
(纸本)0769512585
In this paper we use the tensor product notation as the framework of a programming methodology for designing various parallel prefix algorithms. In this methodology, we first express a computational problem in its matrix form. Next, we formulate a matrix equation for the matrix of the computational problem. Then, solve the matrix equation to obtain some simple matrices. Finally, we recursively factorize the subproblem to obtain a tensor product formula representing an algorithm for this problem. We will use the parallel prefix computation problem to illustrate our methodology and derive various parallel prefix algorithms including divide-and-conquer and recursive doubling algorithms.
This paper presents the SCOOPP (SCalable Object Oriented parallel programming) approach to support the design and execution of scalable parallel applications. The SCOOPP programming model aims the portability, dynamic...
详细信息
This paper presents the SCOOPP (SCalable Object Oriented parallel programming) approach to support the design and execution of scalable parallel applications. The SCOOPP programming model aims the portability, dynamic scalability and efficiency of parallel applications. The SCOOPP is an hybrid compile and run-time system, which can perform parallelism extraction, supports explicit parallelism and performs dynamic granularity control at run-time. The mechanism that supports dynamic grain-size adaptation is presented and performance evaluated on two parallel systems. The measured results show the feasibility of the proposed dynamic grain-size adaptation and a scalability improvement of parallel applications over static parallel OO environments, which suggests cost benefits to develop scalable parallel applications to run on multiple platforms.
The ability to dynamically adapt an unstructured grid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however an efficient parallel implementation is rather difficult, ...
详细信息
The ability to dynamically adapt an unstructured grid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however an efficient parallel implementation is rather difficult, particularly from the viewpoint of portability on various multiprocessor platforms. We address this problem by developing PLUM, an automatic and architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.
This paper explores the transparent programmability of communicating parallel tasks in a Network of Workstations (NOW). Programs which are tied up with specific machines will not be resilient to the changing condition...
详细信息
This paper explores the transparent programmability of communicating parallel tasks in a Network of Workstations (NOW). Programs which are tied up with specific machines will not be resilient to the changing conditions of a NOW. The Distributed Pipes (DP) model enables location independent intertask communication among processes across machines. This approach enables migration of communicating parallel tasks according to runtime conditions. A transparent programming model for a parallel solution to Iterative Grid Computations using DP is also proposed. Programs written using the model are resilient to the heterogeneity of nodes and changing conditions in the NOW. They are also devoid of any network related code. The design of runtime support and function library support are presented. An engineering problem, namely, the Steady State Equilibrium Problem, is studied over the model. The performance analysis shows the speedup due to parallel execution and scaled down memory requirements. We present a case where the effect of communication overhead can be nullified to achieve a linear to super-linear speedup. The analysis discusses performance resilience of Iterative Grid Computations and characterizes synchronization delay among subtasks;and the effect of network overhead and load fluctuations on performance. The performance saturation characteristics of such applications are also studied.
暂无评论