Title :
Automatic detection of software failures: issues and experience
Author :
Savor, T. ; Seviora, R.E.
Author_Institution :
Bell Canada Software Reliability Lab., Waterloo Univ., Ont., Canada
Abstract :
The functionality of many real-time systems depends critically on their software. It is important to know whether their software operates correctly or whether failures are occurring. This would help system operators take corrective actions before minor problems escalate into major disruptions. The paper considers one approach to automatic detection of software failures called supervision. In this approach, a separate unit called the supervisor observes the inputs and outputs of the target program. The supervisor knows what the intended behavior of the target program is and reports deviations as failures. The focus is on event-driven, embedded real-time software. The paper first overviews major issues involved in supervision. These include the definition of correct behavior, observability of program inputs and outputs, dealing with specification nondeterminism, tradeoffs between failure detection accuracy and computational cost, and the continuation of supervision after occurrences of failures. The paper then summarizes experience obtained in supervision of a control program of a small telephone exchange. The exchange and its telephones were simulated on a multiprocessor workstation. The summary includes the results obtained for failure detection capability and computational cost
Keywords :
multiprocessing systems; real-time systems; supervisory programs; system monitoring; telecommunication computing; telephone exchanges; automatic software failure detection; computational cost; control program; correct behavior; corrective actions; event-driven embedded real-time software; failure detection accuracy; multiprocessor workstation; program input observability; program output observability; real-time systems; simulation; small telephone exchange; specification nondeterminism; supervision; supervisor; system operators; target program input observation; target program output observation; Automata; Computational efficiency; Control systems; Delay; Independent component analysis; Laboratories; Software reliability; Software systems; Tellurium; Timing;
Conference_Titel :
Real-Time Systems, 1998. Proceedings. 10th Euromicro Workshop on
Conference_Location :
Berlin
Print_ISBN :
0-8186-8503-4
DOI :
10.1109/EMWRTS.1998.685127