Title :
Scalable Real-Time Monitoring for Distributed Applications
Author :
Yuen, C. -H Philip ; Chan, S. -H Gary
Author_Institution :
Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Kowloon, China
Abstract :
In order to assess service quality of a networked application (such as a streaming session), distributed monitoring servers need to continuously collect application-specific performance metrics in real time. Much of the previous work to address this is to use distributed aggregation tree (DAT) rooted at each monitor. However, this approach often leads to high monitoring delay and network stress. In this paper, we study a highly scalable monitoring network for distributed applications. In the network, there are distributed monitors collecting application performance in two steps: first, client applications report their performance to some proxies by means of a client overlay, and then the proxies report the performance to the distributed monitors using another proxy overlay. We first formulate the problem to construct overlays minimizing monitoring delay. The problem is shown to be NP-hard. Then, we present a simple, efficient, and scalable monitoring algorithm called SMon, which continuously reduces network diameter in real time in a distributed manner. Through simulations and actual experimental measurements with implementation, we show that SMon achieves low monitoring delay, network stress, and protocol overhead for distributed applications.
Keywords :
computational complexity; file servers; monitoring; peer-to-peer computing; quality of service; trees (mathematics); DAT; NP-hard problem; SMon; application-specific performance metrics; client applications; distributed aggregation tree; distributed applications; distributed monitoring servers; monitoring delay minimization; network stress; peer-to-peer approach; protocol overhead; proxy overlay; real-time monitoring; service quality; streaming session; Distributed programming; Monitoring; Peer to peer computing; Real time systems; Real-time systems; Scalability; Distributed protocol; peer-to-peer network; proxies; real-time network monitoring;
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
DOI :
10.1109/TPDS.2012.60