检索结果-内蒙古大学图书馆

A survey of checkpointing algorithms for parallel and distributed computers

SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES 2000年第5期25卷 489-510页

作者： Kalaiselvi, S Rajaraman, V Indian Inst Sci SERC Bangalore 560012 Karnataka India

Checkpoint is defined as a designated place in a program at which normal processing is interrupted specifically to preserve the status information necessary to allow resumption of processing at a later time. checkpointing is the process of saving the status information. This paper surveys the algorithms which have been reported in the literature for checkpointing parallel/distributed systems. It has been observed that most of the algorithms published for checkpointing in message passing systems are based on the seminal article by Chandy and Lamport. A large number of articles have been published in this area by relaxing the assumptions made in this paper and by extending it to minimise the overheads of coordination and context saving. checkpointing for shared memory systems primarily extend cache coherence protocols to maintain a consistent memory. All of them assume that the main memory is safe for storing the context. Recently algorithms have been published for distributed shared memory systems, which extend the cache coherence protocols used in shared memory systems. They however also include methods for storing the status of distributed memory in stable storage. Most of the algorithms assume that there is no knowledge about the programs being executed. It is however felt that in development of parallel programs the user has to do a fair amount of work in distributing tasks and this information can be effectively used to simplify checkpointing and rollback recovery.

关键词： checkpointing algorithms parallel & distributed computing shared memory systems rollback recovery fault-tolerant systems

来源：评论

学校读者我要写书评

暂无评论

Analysis of checkpointing algorithms for Primary-Backup Replication

Analysis of Checkpointing Algorithms for Primary-Backup Repl...

引用

IEEE Symposium on Computers and Communications (ISCC)

作者： Guler, Berkin Ozkasap, Oznur Koc Univ Dept Comp Engn Istanbul Turkey

ISBN: (纸本)9781538616291

Replication is useful for supporting fault-tolerance, reliable and recovery oriented distributed systems. Popular application areas include databases, P2P systems, web services and Internet of Things. In this study, we propose utilizing the checkpointing concept for improving the efficiency of the well-known primary-backup replication protocol in distributed systems. We developed a software framework based on an in-memory replicated key-value store to evaluate various checkpointing algorithms. Using the framework over geographically distributed nodes of the PlanetLab platform, we performed extensive experiments and analysis with several different metrics, including blocking time, checkpointing time, checkpoint size and recovery time. Experimental scenarios consist of using the well-known benchmarking tool, YCSB, performing realistic read/update queries through exemplary workloads. Our findings indicate that incremental checkpointing combined with a periodic usage is the most efficient approach with having up to 30-times better system throughput and 50% decrease in average blocking times compared to traditional primary-backup replication and other checkpointing algorithms.

关键词： checkpointing algorithms primary-backup replication fault tolerance distributed systems

来源：评论

学校读者我要写书评

暂无评论

A graph transformation-based approach for the validation of checkpointing algorithms in distributed systems 23

A graph transformation-based approach for the validation of ...

引用

23rd IEEE International Conference on Enabling Technologies, Infrastructure for Collaborative Enterprises (WETICE)

作者： Khlif, Houda Kacem, Hatem Hadj Hernandez, Saul E. Pomares Eichler, Cedric Kacem, Ahmed Hadj Simon, Alberto Calixto Univ Sfax ReDCAD Lab FSEGS Sfax Tunisia Inst Nacl Astrofis Opt & Electr Puebla 72840 Mexico CNRS LAAS F-31400 Toulouse France Univ Toulouse LAAS F-31400 Toulouse France Univ Papaloapan UNPA Loma Bonita 68400 Oaxaca Mexico

ISBN: (纸本)9781479942497

Autonomic Computing Systems are oriented to prevente the human intervention and to enable distributed systems to manage themselves. One of their challenges is the efficient monitoring at runtime oriented to collect information from which the system can automatically repair itself in case of failure. Quasi-Synchronous checkpointing is a well-known technique, which allows processes to recover in spite of failures. Based on this technique, several checkpointing algorithms have been developed. According to the checkpoint properties detected and ensured, they are classified into: Strictly Z-Path Free (SZPF), Z-Path Free (ZPF) and Z-Cycle Free (ZCF). In the literature, the simulation has been the method adopted for the performance evaluation of checkpointing algorithms. However, few works have been designed to validate their correctness. In this paper, we propose a validation approach based on graph transformation oriented to automatically detect the previous mentioned checkpointing properties. To achieve this, we take the vector clocks resulting from the algorithm execution, and we model it into a causal graph. Then, we design and use transformation rules oriented to verify if in such a causal graph, the algorithm is exempt from non desirable patterns, such as Z-paths or Z-cycles, according to the case.

关键词： Autonomic Computing checkpointing algorithms Distributed Systems Graph Transformation Happened Before Relation Z-cycles Z-paths

来源：评论

学校读者我要写书评

暂无评论

Analysis of checkpointing algorithms for Primary-Backup Replication

Analysis of Checkpointing Algorithms for Primary-Backup Repl...

引用

IEEE Symposium on Computers and Communications

作者： Berkin Guler Oznur Ozkasap Department of Computer Engineering Koc University Istanbul Turkey

关键词： checkpointing algorithms Primary-backup replication Fault tolerance Distributed systems

来源：评论

学校读者我要写书评

暂无评论

checkpointing schemes for adjoint codes: Application to the meteorological model Meso-NH

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2001年第6期22卷 2135-2151页

作者： Charpentier, I IMAG LMC Projet IDOPT F-38041 Grenoble 9 France

The adjoint code of nonlinear computer model calculates gradients along trajectory that has to be known at integration time. When the storage of the whole trajectory requires too large an amount of memory, the calculation of the adjoint code is split and is done part by part from restart points called checkpoints. Griewank proposed checkpointing method named Revolve, which provides an optimal logarithmic behavior with respect to time and memory requirement. In this work, some checkpointing schedules are proposed. Some of them correspond to special cases of Revolve. The user's preference is essential to choose between time and memory requirements. This is key point for adjoint codes of temporal models such's the meteorological model Meso-NH that may be used for weather forecasts. When the computational time is the top priority, particular checkpointing scheme allows computation of the adjoint code with at most one extra integration of the model. The memory requirement behaves then as the square root of the number of iterations of the model. checkpointing schemes are tested on adjoint simulations of Meso-NH.

关键词： checkpointing algorithms adjoint codes inverse problems nonlinear least squares three-dimensional (3D) meteorological simulations leap-frog schemes

来源：评论

学校读者我要写书评

暂无评论

Clustered time warp and logic simulation 95

Clustered time warp and logic simulation

引用

Proceedings of the ninth workshop on Parallel and distributed simulation

作者： Hervé Avril Carl Tropper School of Computer Science McGill University Montréal Canada H3A 2A7 and Hutchison Avenue Software Corporation Montréal Canada School of Computer Science McGill University Montréal Canada H3A 2A7

ISBN: (纸本)9780818671203

We present, in this paper, a hybrid algorithm which makes use of Time Warp between clusters of LPs and a sequential algorithm within the cluster. Time Warp is, of course, traditionally implemented between individual LPs. The algorithm was implemented in a digital logic simulator, and its performance compared to that of Time *** upon this platform we develop a family of three checkpointing algorithms, each of which occupies a different point in the spectrum of possible trade-offs between memory usage and execution time. The algorithms were implemented on several digital logic circuits and their speed, number of states saved and maximal memory consumption were compared to those of Time Warp. One of the algorithms saved between 35 and 50% of the maximal memory consumed by Time Warp (depending upon the number of processors used), while the other two decreased the maximal usage up to 30%. The latter two algorithms exhibited a speed comparable to Time Warp, while the first algorithm was 30-60% *** algorithms are also simpler to implement than optimal checkpointing algorithms.

关键词： circuit analysis computing digital logic circuits hybrid algorithm sequential algorithm clustered time warp logic CAD time warp simulation checkpointing algorithms digital logic simulator logic simulation maximal memory consumption

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：