版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:RAS SB ICMMG Supercomp Software Dept Novosibirsk 630090 Russia Univ Paris 11 LAL CNRS F-91898 Orsay France LRI PCRI F-91405 Orsay France
出 版 物:《FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE》 (下代计算机系统)
年 卷 期:2005年第21卷第5期
页 面:709-715页
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Channel Memory message passing interface fault tolerance Global Computing grid
摘 要:Fault tolerant message passing environments protect parallel applications against node failures. Very large scale computing systems, ranging from large clusters to worldwide Global Computing systems, require a high level of fault tolerance in order to efficiently run parallel applications. The Channel Memory approach provides the infrastructure for scalable tolerance to simultaneous faults. Along with a specially designed checkpointing system and recovery protocol, this approach has resulted in the MPICH-V architecture. In this paper, we describe CMDE - a stand-alone distributed program system based on MPICH-V architecture and implementing an approach to tolerate faults of Channel Memories. (c) 2004 Elsevier B.V. All rights reserved.