DocumentCode :
392387
Title :
A scalable on-line multilevel distributed network fault detection/monitoring system based on the SNMP protocol
Author :
Su, Ming-Shan ; Thulasiraman, K. ; Das, Anindya
Author_Institution :
Dept. of Comput. Sci. & Technol., Southeastern Oklahoma State Univ., Durant, OK, USA
Volume :
2
fYear :
2002
fDate :
17-21 Nov. 2002
Firstpage :
1960
Abstract :
Traditional centralized network management solutions do not scale to present-day large-scale computer/communication networks. Decentralization/distributed solutions can solve some of these problems (Goldszmidt, G. and Yemini, Y., 1995), and thus there is considerable interest in distributed/decentralized network management applications. We present the design and evaluation of an SNMP-based distributed network fault detection/monitoring system. We integrate into the SNMP framework our ML-ADSD algorithm (Su, M.-S. et al., Proc. 39th Annual Allerton Conf. on Commun., Control, and Computers, 2001; Su, "Multilevel distributed diagnosis and the design of a distributed network fault detection system based on the SNMP protocol", Ph.D. Thesis, School of Computer Science, University of Oklahoma, 2002) for fault diagnosis in a distributed processor system. The algorithm uses the multilevel paradigm and requires only minor modifications to be scalable to networks of varying sizes. The system is fault tolerant, allowing processor failure and/or recovery during the diagnosis process. We have implemented the system on an Ethernet network of 32 machines. Our results show that the diagnosis latency (or time to termination) is much better than that of earlier solutions. Also, the system\´s bandwidth utilization is insignificant, demonstrating the practicality of its deployment in a real network. We have successfully integrated three modern disciplines: network management, distributed computing and system level diagnosis.
Keywords :
computer network management; computerised monitoring; fault diagnosis; fault tolerance; protocols; Ethernet network; SNMP; bandwidth utilization; communication networks; computer networks; decentralized network management; distributed computing; distributed network fault detection; distributed network management; distributed network monitoring; fault tolerance; multilevel network fault detection; multilevel network monitoring; processor failure; processor recovery; simple network management protocol; system level diagnosis; Application software; Communication networks; Computer network management; Computer networks; Computerized monitoring; Distributed computing; Fault detection; Fault diagnosis; Large-scale systems; Protocols;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE
Print_ISBN :
0-7803-7632-3
Type :
conf
DOI :
10.1109/GLOCOM.2002.1188542
Filename :
1188542
Link To Document :
بازگشت