DocumentCode :
159487
Title :
Exploration of system availability during software-based self-testing in many-core systems under test latency constraints
Author :
Skitsas, Michael A. ; Nicopoulos, Chrysostomos A. ; Michael, Maria K.
Author_Institution :
KIOS Res. Center, Univ. of Cyprus, Nicosia, Cyprus
fYear :
2014
fDate :
1-3 Oct. 2014
Firstpage :
33
Lastpage :
39
Abstract :
As technology scales, the increased vulnerability of modern systems due to unreliable components becomes a major problem in the era of multi-/many-core architectures. Recently, several on-line testing techniques have been proposed, aiming towards error detection of wear-out/aging-related defects that can appear during the lifetime of a system. In this work, we investigate the relation between system test latency and testtime overhead in multi-/many-core systems with shared LastLevel Cache (LLC) for periodic Software-Based Self-Testing (SBST), under different test scheduling policies. The investigated scheduling policies primarily vary the number of cores concurrently under test in the overall system testing session. Our extensive, workload-driven dynamic exploration reveals that there is an inverse relation between the two test measures; as the number of cores concurrently under test increases, system test latency decreases, but at the cost of significantly increased test time, which sacrifices system availability for running normal workloads. Under given system test latency constraints, which should be utilized in order to be able to control system recovery time in the event of an error detection, our exploration framework identifies the scheduling policy under which overall test time overhead is minimized and, hence, system availability is maximized. Without any loss of generality, a 16-core system is explored in a full-system, execution-driven simulation framework running multi-threaded PARSEC workloads [1].
Keywords :
automatic testing; cache storage; error detection; multiprocessing systems; program testing; 16-core system; LLC; SBST; error detection; last level cache; many-core systems; multicore architectures; multithreaded PARSEC workloads; software-based self-testing; system availability; test latency constraints; test scheduling policy; testtime overhead; wear-out-aging-related defects; workload-driven dynamic exploration; Availability; Benchmark testing; Measurement; Optimization; Processor scheduling; Scheduling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2014 IEEE International Symposium on
Conference_Location :
Amsterdam
Print_ISBN :
978-1-4799-6154-2
Type :
conf
DOI :
10.1109/DFT.2014.6962088
Filename :
6962088
Link To Document :
بازگشت