مرکز منطقه ای اطلاع رساني علوم و فناوري - Exploration of system availability during software-based self-testing in many-core systems under test latency constraints

DocumentCode :

159487

Title :

Exploration of system availability during software-based self-testing in many-core systems under test latency constraints

Author :

Skitsas, Michael A. ; Nicopoulos, Chrysostomos A. ; Michael, Maria K.

Author_Institution :

KIOS Res. Center, Univ. of Cyprus, Nicosia, Cyprus

fYear :

2014

fDate :

1-3 Oct. 2014

Firstpage :

Lastpage :

Abstract :

As technology scales, the increased vulnerability of modern systems due to unreliable components becomes a major problem in the era of multi-/many-core architectures. Recently, several on-line testing techniques have been proposed, aiming towards error detection of wear-out/aging-related defects that can appear during the lifetime of a system. In this work, we investigate the relation between system test latency and testtime overhead in multi-/many-core systems with shared LastLevel Cache (LLC) for periodic Software-Based Self-Testing (SBST), under different test scheduling policies. The investigated scheduling policies primarily vary the number of cores concurrently under test in the overall system testing session. Our extensive, workload-driven dynamic exploration reveals that there is an inverse relation between the two test measures; as the number of cores concurrently under test increases, system test latency decreases, but at the cost of significantly increased test time, which sacrifices system availability for running normal workloads. Under given system test latency constraints, which should be utilized in order to be able to control system recovery time in the event of an error detection, our exploration framework identifies the scheduling policy under which overall test time overhead is minimized and, hence, system availability is maximized. Without any loss of generality, a 16-core system is explored in a full-system, execution-driven simulation framework running multi-threaded PARSEC workloads [1].

Keywords :

automatic testing; cache storage; error detection; multiprocessing systems; program testing; 16-core system; LLC; SBST; error detection; last level cache; many-core systems; multicore architectures; multithreaded PARSEC workloads; software-based self-testing; system availability; test latency constraints; test scheduling policy; testtime overhead; wear-out-aging-related defects; workload-driven dynamic exploration; Availability; Benchmark testing; Measurement; Optimization; Processor scheduling; Scheduling;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2014 IEEE International Symposium on

Conference_Location :

Amsterdam

Print_ISBN :

978-1-4799-6154-2

Type :

conf

DOI :

10.1109/DFT.2014.6962088

Filename :

6962088

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=159487