Title of article
Software fault detection and recovery in critical real-time systems: An approach based on loose coupling
Author/Authors
Alho، نويسنده , , Pekka and Mattila، نويسنده , , Jouni، نويسنده ,
Issue Information
روزنامه با شماره پیاپی سال 2014
Pages
6
From page
2272
To page
2277
Abstract
Remote handling (RH) systems are used to inspect, make changes to, and maintain components in the ITER machine and as such are an example of mission-critical system. Failure in a critical system may cause damage, significant financial losses and loss of experiment runtime, making dependability one of their most important properties. However, even if the software for RH control systems has been developed using best practices, the system might still fail due to undetected faults (bugs), hardware failures, etc. Critical systems therefore need capability to tolerate faults and resume operation after their occurrence. However, design of effective fault detection and recovery mechanisms poses a challenge due to timeliness requirements, growth in scale, and complex interactions. In this paper we evaluate effectiveness of service-oriented architectural approach to fault tolerance in mission-critical real-time systems. We use a prototype implementation for service management with an experimental RH control system and industrial manipulator. The fault tolerance is based on using the high level of decoupling between services to recover from transient faults by service restarts. In case the recovery process is not successful, the system can still be used if the fault was not in a critical software module.
Keywords
ITER , remote handling , Software , dependability , Fault tolerance , Real-time
Journal title
Fusion Engineering and Design
Serial Year
2014
Journal title
Fusion Engineering and Design
Record number
2362948
Link To Document