Title :
Predictable, Efficient System-Level Fault Tolerance in C^3
Author :
Jiguo Song ; Wittrock, John ; Parmer, Gabriel
Author_Institution :
George Washington Univ., Washington, DC, USA
Abstract :
Predictable reliability is an increasingly important aspect of embedded and real-time systems. This includes the ability to recover from unknown faults in a manner that maintains system timing guarantees, even when these faults occur within system components. This paper presents the C3 system, which is the first system implementation we know of for predictable, system-level fault tolerance that doesn´t require physical redundancy. We introduce both the system design, and two timing analyses that enable the predictable recovery from faults in operating system components, and identify recovery inversion as a main impediment to schedulable recovery. C3 provides fault-tolerance for low-level system components using a combination of efficient u-reboots, and an interface-driven mechanism to recreate component state. C3 introduces on-demand recovery that properly prioritizes aspects of the recovery process to avoid this inversion and not inhibit system timeliness. We compare this system to both eager recovery, and to check pointing of a Para virtualized real-time OS.
Keywords :
embedded systems; operating systems (computers); software fault tolerance; system recovery; systems analysis; C3 system; embedded system; fault recovery; interface-driven mechanism; low-level system components; on-demand recovery; operating system components; real-time system; recovery inversion; recovery predictability; recovery schedulability; reliability predictability; system design; system-level fault tolerance; timing analyses; u-reboots; Real-time systems; component-based design; fault-tolerance; operating systems; real-time; reliability;
Conference_Titel :
Real-Time Systems Symposium (RTSS), 2013 IEEE 34th
Conference_Location :
Vancouver, BC
DOI :
10.1109/RTSS.2013.11