版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:MathWorks Inc Natick MA 01760 USA N Carolina State Univ Dept Comp Sci Raleigh NC 27695 USA IBM Corp Res Triangle Pk NC 27709 USA
出 版 物:《IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS》 (IEEE Trans Parallel Distrib Syst)
年 卷 期:2013年第24卷第3期
页 面:576-586页
核心收录:
学科分类:0808[工学-电气工程] 08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:US National Science Foundation (NSF) [CNS0915567] NSF [CNS0915861, CNS1149445] US Army Research Office (ARO) [W911NF-10-1-0273] IBM Faculty Awards Google Research Awards Direct For Computer & Info Scie & Enginr Division Of Computer and Network Systems Funding Source: National Science Foundation
主 题:Online data compression distributed system monitoring
摘 要:Large-scale hosting infrastructures have become the fundamental platforms for many real-world systems such as cloud computing infrastructures, enterprise data centers, and massive data processing systems. However, it is a challenging task to achieve both scalability and high precision while monitoring a large number of intranode and internode attributes (e. g., CPU usage, free memory, free disk, internode network delay). In this paper, we present the design and implementation of a Resilient self-Compressive Monitoring (RCM) system for large-scale hosting infrastructures. RCM achieves scalable distributed monitoring by performing online data compression to reduce remote data collection cost. RCM provides failure resilience to achieve robust monitoring for dynamic distributed systems where host and network failures are common. We have conducted extensive experiments using a set of real monitoring data from NCSU s virtual computing lab (VCL), PlanetLab, a Google cluster, and real Internet traffic matrices. The experimental results show that RCM can achieve up to 200 percent higher compression ratio and several orders of magnitude less overhead than the existing approaches.