DocumentCode :
129032
Title :
Reliability-Aware Exceptions: Tolerating intermittent faults in microprocessor array structures
Author :
Dweik, Waleed ; Annavaram, Murali ; Dubois, Matthieu
Author_Institution :
Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
fYear :
2014
fDate :
24-28 March 2014
Firstpage :
1
Lastpage :
6
Abstract :
In future technology nodes, reliability is expected to become a first-order design constraint. Faults encountered in a chip can be classified into three categories: transient, intermittent, and permanent. Fault classification allows a chip to take the appropriate corrective action. Mechanisms have been proposed to distinguish transient from non-transient faults where all non-transient faults are handled as permanent. Intermittent faults induced by wearout phenomena have become the dominant reliability concern in nanoscale technology, yet there is no mechanism that provides finer classification of non-transient faults into intermittent and permanent faults. In this paper, we present a new class of exceptions called Reliability-Aware Exceptions (RAEs) which provide the ability to distinguish intermittent faults in microprocessor array structures. The RAE handlers have the ability to manipulate microprocessor array structures to recover from all three categories of faults. Using RAEs, we demonstrate that the reliability of two representative microarchitecture structures, load/store queue and reorder buffer in an out-of-order processor, is improved by average factors of 1.3 and 1.95, respectively.
Keywords :
fault diagnosis; fault tolerance; integrated circuit reliability; microprocessor chips; RAEs; fault classification; first-order design constraint; intermittent fault tolerance; load-store queue; microprocessor array structures; nanoscale technology; nontransient faults; out-of-order processor; permanent faults; reliability-aware exceptions; reorder buffer; representative microarchitecture structures; wearout phenomena; Arrays; Circuit faults; Computational modeling; Microprocessors; Radiation detectors; Reliability; Transient analysis; array strucutre; de-configuration; fault injection; intermittent fault;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014
Conference_Location :
Dresden
Type :
conf
DOI :
10.7873/DATE.2014.114
Filename :
6800315
Link To Document :
بازگشت