Weather forecasting models are computationally intensive applications. These models are typically executed in parallel machines and a major obstacle for their scalability is load imbalance. The causes of such imbalanc...
详细信息
Weather forecasting models are computationally intensive applications. These models are typically executed in parallel machines and a major obstacle for their scalability is load imbalance. The causes of such imbalance are either static (e.g. topography) or dynamic (e.g. shortwave radiation, moving thunderstorms). Various techniques, often embedded in the application's source code, have been used to address both sources. However, these techniques are inflexible and hard to use in legacy codes. In this paper, we demonstrate the effectiveness of processor virtualization for dynamically balancing the load in BRAMS, a mesoscale weather forecasting model based on MPI parallelization. We use the Charm++ infrastructure, with its over-decomposition and object-migration capabilities, to move subdomains across processors during execution of the model. Processor virtualization enables better overlap between computation and communication and improved cache efficiency. Furthermore, by employing an appropriate load balancer, we achieve better processor utilization while requiring minimal changes to the model's code.
Multiple petaflops-lass machines will appear during the coming year, and many multipetaflops machines are on the anvil. It will be a substantial challenge to make existing parallel CSE applications run efficiently on ...
In the recent years a growing interest in Collaborative Virtual Environments (CVEs) can be observed. Users at different locations on the Globe are able to communicate and interact in the same virtual space as if they ...
详细信息
ISBN:
(纸本)3540393684
In the recent years a growing interest in Collaborative Virtual Environments (CVEs) can be observed. Users at different locations on the Globe are able to communicate and interact in the same virtual space as if they were in the same physical location. For the implementation of CVEs several approaches exist. General ideas for the design of Virtual Environments (VEs) are analyzed and a novel approach in the form of a highly extensible, flexible, and modular framework-inVRs is presented.
Processor virtualization is a parallelization technique that may be used to enhance the performance of parallel applications through the improvement of cache performance, overlapping of communication and computation. ...
详细信息
Processor virtualization is a parallelization technique that may be used to enhance the performance of parallel applications through the improvement of cache performance, overlapping of communication and computation. In this study, we use the processor virtualization technique to parallelize the level set method for solving solidification problems. Numerical results on a distributed memory machine are reported to show the performance of the resulting level set solver, and demonstrate the advantages of using processor virtualization. (C) 2006 Elsevier Inc. All rights reserved.
A growing interest in Collaborative Virtual Environments (CVEs) can be observed over the last few years. Geographically dislocated users share a common virtual space as if they were at the same physical location. Alth...
详细信息
ISBN:
(纸本)3540260447
A growing interest in Collaborative Virtual Environments (CVEs) can be observed over the last few years. Geographically dislocated users share a common virtual space as if they were at the same physical location. Although Virtual Reality (VR) is heading more and more in the direction of creating lifelike environments and stimulating all of the users senses the technology does not yet allow communication and interaction as it is in the real world. A more abstract representation is sufficient in most CVEs. This paper provides an overview on tools which can be used to enhance communication and interaction in CVEs by visualising behaviour. Not only is a set of tools presented and classified, an implementation approach on how to use these tools in a structured way in form of a framework is also given.
Processor virtualization is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. Charm++ is an early language/system that supports proc...
详细信息
As a part of an ongoing effort to develop a "standard library" for scientific and engineering parallel applications, we have developed a preliminary finite element framework. This framework allows an applica...
详细信息
Many parallel scientific applications have dynamic and irregular computational structure. However, most such applications exhibit persistence of computational load and communication structure. This allows us to embed ...
详细信息
Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split algorithm, for vector prefix and reduc...
详细信息
ISBN:
(纸本)0818684038
Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split algorithm, for vector prefix and reduction computation on coarse-grained, distributed-memory parallel machines. Our algorithms are relatively architecture independent and can be used effectively in many applications such as Pack/Unpack, Array Prefix/Reduction Functions, and Array Combining Scatter Functions, which are defined in Fortran 90 and in High Performance Fortran. Experimental results on the CM-5 are presented.
Vector prefix and reduction are collective communication primitives in which all processors must cooperate. The authors present two parallel algorithms, the direct algorithm and the split algorithm, for vector prefix ...
详细信息
Vector prefix and reduction are collective communication primitives in which all processors must cooperate. The authors present two parallel algorithms, the direct algorithm and the split algorithm, for vector prefix and reduction computation on coarse-grained, distributed-memory parallel machines. The algorithms are relatively architecture independent and can be used effectively in many applications such as pack/unpack, array prefix/reduction functions, and array combining scatter functions, which are defined in Fortran 90 and in High Performance Fortran. Experimental results on the CM-5 are presented.
暂无评论