咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >CDMCR: multi-level fault-toler... 收藏

CDMCR: multi-level fault-tolerant system for distributed applications in cloud

CDMCR : 多水平为在云的分布式的应用的差错容忍的系统

作     者:Qiang, Weizhong Jiang, Changqing Ran, Longbo Zou, Deqing Jin, Hai 

作者机构:Huazhong Univ Sci & Technol Serv Comp Technol & Syst Lab Wuhan 430074 Peoples R China Huazhong Univ Sci & Technol Cluster & Grid Comp Lab Sch Comp Sci & Technol Wuhan 430074 Peoples R China 

出 版 物:《SECURITY AND COMMUNICATION NETWORKS》 (安全与通信网络)

年 卷 期:2016年第9卷第15期

页      面:2766-2778页

核心收录:

学科分类:0810[工学-信息与通信工程] 0809[工学-电子科学与技术(可授工学、理学学位)] 08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:National Natural Science Foundation of China National Basic Research Program of China (973 Program) [2014CB340600] Program for New Century Excellent Talents in University [NCET-13-0241] Fundamental Research Funds for the Central Universities of HUST [2013TS105] 

主  题:virtual cluster distributed applications fault-tolerant 

摘      要:Cloud provides users with a new model of utilizing the computing infrastructure with the ability to perform parallel and distributed computations using elastic virtual cluster. However, the multi-level and complex features make cloud computing system more prone to failure. In this paper, we present a multi-level fault-tolerant system for distributed applications in cloud named Distributed-application oriented Multi-level Checkpoint/Restart for Cloud (CDMCR). The CDMCR system backups the complete state of applications periodically with a snapshot-based distributed checkpointing protocol, including file system state. Thus, we cannot only recover processes but also rollback data. A multi-level recovery strategy is proposed, which includes process-level recovery, virtual machine recreation, and host rescheduling, enabling comprehensive and efficient fault tolerance for different components in cloud. We deploy CDMCR as PaaS, so that users can be liberated from node management and system configuration and get access to fault-tolerant service conveniently. We have implemented this system based on the Xen virtualization platform and the OpenNebula cloud platform. Experiments on the prototype demonstrate the correctness of the system. Analysis shows that CDMCR does not cause message loss or data loss, and the backup time remains nearly constant as the number of nodes increases on virtual cluster. Copyright (c) 2015 John Wiley & Sons, Ltd.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分