Title :
Efficient fault diagnosis using incremental alarm correlation and active investigation for internet and overlay networks
Author :
Tang, Yonginig ; Al-Shaer, Ehab ; Boutaba, Raouf
Author_Institution :
Sch. of Inf. Technol., Illinois State Univ., Normal, IL
fDate :
3/1/2008 12:00:00 AM
Abstract :
Fault localization is the core element in fault management. Symptom-fault map is commonly used to describe the symptom-fault causality in fault reasoning. For Internet service networks, a well-designed monitoring system can effectively correlate the observable symptoms (i.e., alarms) with the critical network faults (e.g., link failure). However, the lost and spurious symptoms can significantly degrade the performance and accuracy of a passive fault localization system. For overlay networks, due to limited underlying network accessibility, as well as the overlay scalability and dynamics, it is impractical to build a static overlay symptom-fault map. In this paper, we firstly propose a novel active integrated fault reasoning (AIR) framework to incrementally incorporate active investigation actions into the passive fault reasoning process based on an extended symptom-fault-action (SFA) model. Secondly, we propose an overlay network profile (ONP) to facilitate the dynamic creation of an overlay symptom-fault-action (called O-SFA) model, such that the AIR framework can be applied seamlessly to overlay networks (called O-AIR). As a result, the corresponding fault reasoning and action selection algorithms are elaborated. Extensive simulations and Internet experiments show that AIR and O-AIR can significantly improve both accuracy and performance in the fault reasoning for Internet and overlay service networks, especially when the ratio of the lost and spurious symptoms is high.
Keywords :
Internet; computer network management; computer network reliability; fault diagnosis; Internet; active integrated fault reasoning; fault diagnosis; fault localization; fault management; incremental alarm correlation; network accessibility; overlay network profile; overlay service networks; passive fault reasoning process; symptom-fault map; Computer science; Computerized monitoring; Condition monitoring; Degradation; Fault diagnosis; IP networks; Information technology; Management information systems; Scalability; Web and internet services;
Journal_Title :
Network and Service Management, IEEE Transactions on
DOI :
10.1109/TNSM.2008.080104