• DocumentCode
    1946448
  • Title

    Toward efficient check-pointing and rollback under on-demand SBST in chip multi-processors

  • Author

    Skitsas, Michael A. ; Nicopoulos, Chrysostomos A. ; Michael, Maria K.

  • Author_Institution
    KIOS Res. Center, Univ. of Cyprus, Nicosia, Cyprus
  • fYear
    2015
  • fDate
    6-8 July 2015
  • Firstpage
    110
  • Lastpage
    115
  • Abstract
    In-field on-line testing techniques have recently been proposed for permanent fault detection caused by wear-out/aging-related defects manifesting during the lifetime of a system. Selective Software-Based Self-Testing (SBST) is one such paradigm focusing primarily on the recently stressed functional units of a multicore system at a sub-core granularity, in an attempt to reduce the application performance penalty caused by periodically testing the entire system. In this work, we complement our O/S-enabled framework DeamonGuard for on-demand (selective) SBST to support fault recovery capabilities. Towards this goal, we propose an efficient check pointing and rollback recovery mechanism which, upon fault detection, can restore the system to the most recently valid correct state and resume the normal operation assuming disabling of the faulty core, thereby leading to a healthy (but degraded) system. The work in this paper concentrates on reducing the number of stored checkpoints required when testing at a sub-core granularity, and minimizing the recovery penalty of such framework. We evaluate and demonstrate the overhead of the proposed recovery mechanism, and our results indicate a practical reduction in the number of stored checkpoints as well as a significant improvement in recovery latency for the cases where the faults are correlated with the stressed units.
  • Keywords
    automatic test software; fault diagnosis; microprocessor chips; multiprocessing systems; DeamonGuard; O-S-enabled framework; aging-related defects; check-pointing; chip multiprocessors; fault recovery capabilities; in-field online testing techniques; multicore system; on-demand SBST; permanent fault detection; rollback recovery mechanism; selective software-based self-testing; stressed functional units; subcore granularity; wear-out defects; Aging; Built-in self-test; Checkpointing; Fault detection; Hardware; Multicore processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    On-Line Testing Symposium (IOLTS), 2015 IEEE 21st International
  • Conference_Location
    Halkidiki
  • Type

    conf

  • DOI
    10.1109/IOLTS.2015.7229842
  • Filename
    7229842