DocumentCode
1946448
Title
Toward efficient check-pointing and rollback under on-demand SBST in chip multi-processors
Author
Skitsas, Michael A. ; Nicopoulos, Chrysostomos A. ; Michael, Maria K.
Author_Institution
KIOS Res. Center, Univ. of Cyprus, Nicosia, Cyprus
fYear
2015
fDate
6-8 July 2015
Firstpage
110
Lastpage
115
Abstract
In-field on-line testing techniques have recently been proposed for permanent fault detection caused by wear-out/aging-related defects manifesting during the lifetime of a system. Selective Software-Based Self-Testing (SBST) is one such paradigm focusing primarily on the recently stressed functional units of a multicore system at a sub-core granularity, in an attempt to reduce the application performance penalty caused by periodically testing the entire system. In this work, we complement our O/S-enabled framework DeamonGuard for on-demand (selective) SBST to support fault recovery capabilities. Towards this goal, we propose an efficient check pointing and rollback recovery mechanism which, upon fault detection, can restore the system to the most recently valid correct state and resume the normal operation assuming disabling of the faulty core, thereby leading to a healthy (but degraded) system. The work in this paper concentrates on reducing the number of stored checkpoints required when testing at a sub-core granularity, and minimizing the recovery penalty of such framework. We evaluate and demonstrate the overhead of the proposed recovery mechanism, and our results indicate a practical reduction in the number of stored checkpoints as well as a significant improvement in recovery latency for the cases where the faults are correlated with the stressed units.
Keywords
automatic test software; fault diagnosis; microprocessor chips; multiprocessing systems; DeamonGuard; O-S-enabled framework; aging-related defects; check-pointing; chip multiprocessors; fault recovery capabilities; in-field online testing techniques; multicore system; on-demand SBST; permanent fault detection; rollback recovery mechanism; selective software-based self-testing; stressed functional units; subcore granularity; wear-out defects; Aging; Built-in self-test; Checkpointing; Fault detection; Hardware; Multicore processing;
fLanguage
English
Publisher
ieee
Conference_Titel
On-Line Testing Symposium (IOLTS), 2015 IEEE 21st International
Conference_Location
Halkidiki
Type
conf
DOI
10.1109/IOLTS.2015.7229842
Filename
7229842
Link To Document