Title :
Monitoring continuous state violation in datacenters: Exploring the time dimension
Author :
Meng, Shicong ; Wang, Ting ; Liu, Ling
Author_Institution :
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
Abstract :
Monitoring global states of an application deployed over distributed nodes becomes prevalent in today´s datacenters. State monitoring requires not only correct monitoring results but also minimum communication cost for efficiency and scalability. Most existing work adopts an instantaneous state monitoring approach, which triggers state alerts whenever a constraint is violated. Such an approach, however, may cause frequent and unnecessary state alerts due to unpredictable monitored value bursts and momentary outliers that are common in large-scale Internet applications. These false alerts may further lead to expensive and problematic counter-measures. To address this issue, we introduce window-based state monitoring in this paper. Window-based state monitoring evaluates whether state violation is continuous within a time window, and thus, gains immunity to short-term value bursts and outliers. Furthermore, we find that exploring the monitoring time window at distributed nodes achieves significant communication savings over instantaneous monitoring. Based on this finding, we develop WISE, a system that efficiently performs WIndow-based StatE monitoring at datacenter-scale. WISE is highlighted with three sets of techniques. First, WISE uses distributed filtering time windows and intelligently avoids global information collecting to achieve communication efficiency, while guaranteeing monitoring correctness at the same time. Second, WISE provides a suite of performance tuning techniques to minimize communication cost based on a sophisticated cost model. Third, WISE also employs a set of novel performance optimization techniques. Extensive experiments over both real world and synthetic traces show that WISE achieves a 50% - 90% reduction in communication cost compared with existing instantaneous monitoring approaches and simple alternative schemes.
Keywords :
computer centres; computerised monitoring; optimisation; statistics; continuous state violation; data centers; filtering time windows; instantaneous state monitoring approach; momentary outliers; performance optimization techniques; performance tuning techniques; state monitoring; time dimension; unpredictable monitored value bursts; violation monitoring; window-based state monitoring; Application software; Computerized monitoring; Costs; Distributed computing; Educational institutions; Information filtering; Information filters; Internet; Large-scale systems; Scalability;
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
DOI :
10.1109/ICDE.2010.5447923