Title :
Resilient Self-Compressive Monitoring for Large-Scale Hosting Infrastructures
Author :
Yongmin Tan ; Venkatesh, Vivek ; Xiaohui Gu
Author_Institution :
MathWorks Inc., Natick, MA, USA
Abstract :
Large-scale hosting infrastructures have become the fundamental platforms for many real-world systems such as cloud computing infrastructures, enterprise data centers, and massive data processing systems. However, it is a challenging task to achieve both scalability and high precision while monitoring a large number of intranode and internode attributes (e.g., CPU usage, free memory, free disk, internode network delay). In this paper, we present the design and implementation of a Resilient self-Compressive Monitoring (RCM) system for large-scale hosting infrastructures. RCM achieves scalable distributed monitoring by performing online data compression to reduce remote data collection cost. RCM provides failure resilience to achieve robust monitoring for dynamic distributed systems where host and network failures are common. We have conducted extensive experiments using a set of real monitoring data from NCSU´s virtual computing lab (VCL), PlanetLab, a Google cluster, and real Internet traffic matrices. The experimental results show that RCM can achieve up to 200 percent higher compression ratio and several orders of magnitude less overhead than the existing approaches.
Keywords :
data compression; distributed processing; monitoring; Google cluster; Internet traffic matrices; NCSU virtual computing lab; PlanetLab; RCM system; VCL; distributed monitoring; dynamic distributed systems; internode attributes; intranode attributes; large-scale hosting infrastructures; online data compression; remote data collection cost reduction; resilient self-compressive monitoring; Data compression; Distributed databases; Image coding; Measurement; Monitoring; Peer to peer computing; Training; Online data compression; distributed system monitoring;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2012.167