检索结果-内蒙古大学图书馆

International Conference on High Performance Computing

作者： Eduardo R. Rodrigues Philippe O. A. Navaux Jairo Panetta Celso L. Mendes Laxmikant V. Kalé Institute of Informatics Federal University of Rio Grande do Sul Porto Alegre Brazil Institute of Informatics Federal University of Rio Grande do Sul Porto Alegre Brazil Center of Weather Forecasts and Climate Studies Cachoeira Paulista Brazil Center for Weather Forecasts and Climate Studies/ INPE Cachoeira Paulista Brazil Parallel Programming Laboratory University of Illinois at Urbana-Champaign Urbana USA

Weather forecasting models are computationally intensive applications. These models are typically executed in parallel machines and a major obstacle for their scalability is load imbalance. The causes of such imbalance are either static (e.g. topography) or dynamic (e.g. shortwave radiation, moving thunderstorms). Various techniques, often embedded in the application's source code, have been used to address both sources. However, these techniques are inflexible and hard to use in legacy codes. In this paper, we demonstrate the effectiveness of processor virtualization for dynamically balancing the load in BRAMS, a mesoscale weather forecasting model based on MPI parallelization. We use the Charm++ infrastructure, with its over-decomposition and object-migration capabilities, to move subdomains across processors during execution of the model. Processor virtualization enables better overlap between computation and communication and improved cache efficiency. Furthermore, by employing an appropriate load balancer, we achieve better processor utilization while requiring minimal changes to the model's code.

关键词： Load modeling Computational modeling Atmospheric modeling Meteorology Predictive models Load management Runtime

来源：评论

学校读者我要写书评

暂无评论

Some essential techniques for developing efficient petascale applications

引用

Journal of Physics: Conference Series 2008年第1期125卷

作者： Kalé, L.V. Parallel Programming Laboratory Computer Science Department University of Illinois at Urbana-Champaign Urbana IL 61810 United States

Multiple petaflops-lass machines will appear during the coming year, and many multipetaflops machines are on the anvil. It will be a substantial challenge to make existing parallel CSE applications run efficiently on them, and even more challenging to design new applications that can effectively leverage the large computational power of these machines. Multicore chips and SMP nodes are becoming popular and pose challenges of their own. Further, a new set of challenges in productivity arise, especially if we wish to have a broader set of applications and people to use these machines. Reviewed here is a set of techniques that have proved useful in multiple parallel applications that have scaled to tens of thousands of processors, on machines such as the Blue Gene/L, Blue Gene/P, Cray XT3, and XT4. New challenges and potential solutions for the performance issues are identified. Issues presented by multicore chips and SMP nodes also rre addressed. Also reviewed are some new and old ideas for increasing productivity in parallel programming substantially. © 2008 IOP Publishing Ltd.

关键词：

来源：评论

学校读者我要写书评

暂无评论

inVRs - A framework for building interactive networked virtual reality systems

引用

2nd International Conference on High Performance Computing and Communications (HPCC 2006)

作者： Anthes, Christoph Volkert, Jens Johannes Kepler Univ Linz GUP Inst Graphics & Parallel Programming A-4040 Linz Austria

ISBN: (纸本)3540393684

In the recent years a growing interest in Collaborative Virtual Environments (CVEs) can be observed. Users at different locations on the Globe are able to communicate and interact in the same virtual space as if they were in the same physical location. For the implementation of CVEs several approaches exist. General ideas for the design of Virtual Environments (VEs) are analyzed and a novel approach in the form of a highly extensible, flexible, and modular framework-inVRs is presented.

关键词： Interactive computer systems

来源：评论

学校读者我要写书评

暂无评论

parallelization of a level set method for simulating dendritic growth

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2006年第11期66卷 1379-1386页

作者： Wang, Kai Chang, Anthony Kale, Laxmikant V. Dantzig, Jonathan A. Univ S Dakota Dept Comp Sci Vermillion SD 57069 USA Univ Illinois Dept Comp Sci Parallel Programming Lab Urbana IL 61801 USA Shandong Econ Univ Dept Comp Sci Shandong 250014 Peoples R China

Processor virtualization is a parallelization technique that may be used to enhance the performance of parallel applications through the improvement of cache performance, overlapping of communication and computation. In this study, we use the processor virtualization technique to parallelize the level set method for solving solidification problems. Numerical results on a distributed memory machine are reported to show the performance of the resulting level set solver, and demonstrate the advantages of using processor virtualization. (C) 2006 Elsevier Inc. All rights reserved.

关键词： processor virtualization level set methods MPI AMPI solidification

来源：评论

学校读者我要写书评

暂无评论

A toolbox supporting collaboration in networked virtual environments

引用

5th International Conference on Computational Science (ICCS 2005)

作者： Anthes, C Volkert, J Univ Linz GUP Inst Graph & Parallel Programming A-4040 Linz Austria

ISBN: (纸本)3540260447

A growing interest in Collaborative Virtual Environments (CVEs) can be observed over the last few years. Geographically dislocated users share a common virtual space as if they were at the same physical location. Although Virtual Reality (VR) is heading more and more in the direction of creating lifelike environments and stimulating all of the users senses the technology does not yet allow communication and interaction as it is in the real world. A more abstract representation is sufficient in most CVEs. This paper provides an overview on tools which can be used to enhance communication and interaction in CVEs by visualising behaviour. Not only is a set of tools presented and classified, an implementation approach on how to use these tools in a structured way in form of a framework is also given.

关键词： Virtual reality

来源：评论

学校读者我要写书评

暂无评论

Adaptive MPI

Adaptive MPI

引用

16th International Workshop on Languages and Compilers for parallel Computing, LCPC 2003

作者： Huang, Chao Lawlor, Orion Kalé, L.V. Parallel Programming Laboratory University of Illinois at Urbana-Champaign United States

ISBN: (纸本)9783540246442

Processor virtualization is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. Charm++ is an early language/system that supports processor virtualization. This paper describes Adaptive MPI or AMPI, an MPI implementation and extension, that supports processor virtualization. AMPI implements virtual MPI processes (VPs), several of which may be mapped to a single physical processor. AMPI includes a powerful runtime support system that takes advantage of the degree of freedom afforded by allowing it to assign VPs onto processors. With this runtime system, AMPI supports such features as automatic adaptive overlap of communication and computation and automatic load balancing. It can also support other features such as check pointing without additional user code, and the ability to shrink and expand the set of processors used by a job at runtime. This paper describes AMPI, its features, benchmarks that illustrate performance advantages and tradeoffs offered by AMPI, and application experiences. © Springer-Verlag Berlin Heidelberg 2004.

关键词： Degrees of freedom (mechanics)

来源：评论

学校读者我要写书评

暂无评论

A parallel framework for explicit FEM 7th

引用

7th International Conference on High Performance Computing, HiPC 2000

作者： Bhandarkar, Milind A. Kalé, Laxmikant V. Parallel Programming Laboratory Department of Computer Science University of Illinois Urbana-Champaign United States

ISBN: (纸本)3540414290

As a part of an ongoing effort to develop a "standard library" for scientific and engineering parallel applications, we have developed a preliminary finite element framework. This framework allows an application scientist interested in modeling structural properties of materials, including dynamic behavior such as crack propagation, to develop codes that embody their modeling techniques without having to pay attention to the parallelization process. The resultant code modularly separates parallel implementation techniques from numerical algorithms. As the framework builds upon an object-based load balancing framework, it allows the resultant applications to automatically adapt to load imbalances resulting from the application or the environment (e.g. timeshared clusters). This paper presents results from the first version of the framework, and demonstrates results on a crack propagation application. © Springer-Verlag Berlin Heidelberg 2000.

关键词： Crack propagation

来源：评论

学校读者我要写书评

暂无评论

Run-time support for adaptive load balancing

Run-time support for adaptive load balancing

引用

15 Workshops Held in Conjunction with the IEEE International parallel and Distributed Processing Symposium, IPDPS 2000

作者： Bhandarkar, Milind A. Brunner, Robert K. Kalé, Laxmikant V. Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign United States

ISBN: (纸本)354067442X

Many parallel scientific applications have dynamic and irregular computational structure. However, most such applications exhibit persistence of computational load and communication structure. This allows us to embed measurement-based automatic load balancing frame-work in run-time systems of parallel languages that are used to build such applications. In this paper, we describe such a framework built for the Converse [4] interoperable runtime system. This framework is composed of mechanisms for recording application performance data, a mechanism for object migration, and interfaces for plug-in load balancing strategy objects. Interfaces for strategy objects allow easy implementation of novel load balancing strategies that could use application characteristics on the entire machine, or only a local neighborhood. We present the performance of a few strategies on a synthetic benchmark and also the impact of automatic load balancing on an actual application. © 2000 Springer-Verlag Berlin Heidelberg.

关键词： Interoperability

来源：评论

学校读者我要写书评

暂无评论

Vector prefix and reduction computation on coarse-grained, distributed-memory parallel machines 1

Vector prefix and reduction computation on coarse-grained, d...

引用

1st Merged International parallel Processing Symposium/Symposium on parallel and Distributed Processing (IPPS/SPDP 1998)

作者： Bae, S Kim, D Ranka, S ETRI Parallel Programming Sect Taejon South Korea

ISBN: (纸本)0818684038

Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split algorithm, for vector prefix and reduction computation on coarse-grained, distributed-memory parallel machines. Our algorithms are relatively architecture independent and can be used effectively in many applications such as Pack/Unpack, Array Prefix/Reduction Functions, and Array Combining Scatter Functions, which are defined in Fortran 90 and in High Performance Fortran. Experimental results on the CM-5 are presented.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Vector prefix and reduction computation on coarse-grained, distributed-memory parallel machines

Vector prefix and reduction computation on coarse-grained, d...

引用

International Symposium on parallel Processing

作者： Seungjo Bae D. Kim S. Ranka Parallel Programming Section Electronics and Telecommunications Research Institute Daejeon South Korea Department of CIS Syracuse University Syracuse NY USA Department of CISE University of Florida Gainesville FL USA

Vector prefix and reduction are collective communication primitives in which all processors must cooperate. The authors present two parallel algorithms, the direct algorithm and the split algorithm, for vector prefix and reduction computation on coarse-grained, distributed-memory parallel machines. The algorithms are relatively architecture independent and can be used effectively in many applications such as pack/unpack, array prefix/reduction functions, and array combining scatter functions, which are defined in Fortran 90 and in High Performance Fortran. Experimental results on the CM-5 are presented.

关键词： Concurrent computing Distributed computing parallel machines Hypercubes Computational Intelligence Society Flyback transformers parallel programming parallel algorithms Scattering Binary trees

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：