Title :
Preventing network instability caused by propagation of control plane poison messages
Author :
Du, Xiaojiang ; Shayman, Mark A. ; Skoog, Ronald A.
Author_Institution :
Dept. of Electr. & Comput. Eng., Maryland Univ., College Park, MD, USA
Abstract :
We present a framework of fault management for a particular type of failure propagation that we refer to as "poison message failure propagation": Some or all of the network elements have a software or protocol \´bug\´ which is activated on receipt of a certain network control/management message (the poison message). This activated \´bug\´ will cause the node to fail with some probability. If the network control or management is such that this message is persistently passed among the network nodes, and if the node failure probability is sufficiently high, large-scale instability can result. In order to mitigate this problem. we propose a combination of passive diagnosis and active diagnosis. Passive diagnosis includes protocol analysis of messages received and sent by failed nodes, correlation of messages among multiple failed nodes and analysis of the pattern of failure propagation. This is combined with active diagnosis in which filters are dynamically configured to block suspect protocols or message types. OPNET simulations show the effectiveness of passive diagnosis. Message filtering is formulated as a sequential decision problem, and a heuristic policy is proposed for this problem.
Keywords :
digital simulation; fault diagnosis; protocols; stability; telecommunication control; telecommunication network management; FSM method; IP networks; OPNET simulations; active diagnosis; control plane poison message propagation; fault management; heuristic policy; large-scale instability; message correlation; message filtering; network control/management message; network elements; network instability prevention; network nodes; node failure probability; passive diagnosis; poison message failure propagation; protocol analysis; protocol bug; sequential decision problem; software bug; Educational institutions; Engineering management; Failure analysis; Large-scale systems; Pattern analysis; Protocols; Switches; Technology management; Telecommunication switching; Toxicology;
Conference_Titel :
MILCOM 2002. Proceedings
Print_ISBN :
0-7803-7625-0
DOI :
10.1109/MILCOM.2002.1180420