• DocumentCode
    129032
  • Title

    Reliability-Aware Exceptions: Tolerating intermittent faults in microprocessor array structures

  • Author

    Dweik, Waleed ; Annavaram, Murali ; Dubois, Matthieu

  • Author_Institution
    Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
  • fYear
    2014
  • fDate
    24-28 March 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    In future technology nodes, reliability is expected to become a first-order design constraint. Faults encountered in a chip can be classified into three categories: transient, intermittent, and permanent. Fault classification allows a chip to take the appropriate corrective action. Mechanisms have been proposed to distinguish transient from non-transient faults where all non-transient faults are handled as permanent. Intermittent faults induced by wearout phenomena have become the dominant reliability concern in nanoscale technology, yet there is no mechanism that provides finer classification of non-transient faults into intermittent and permanent faults. In this paper, we present a new class of exceptions called Reliability-Aware Exceptions (RAEs) which provide the ability to distinguish intermittent faults in microprocessor array structures. The RAE handlers have the ability to manipulate microprocessor array structures to recover from all three categories of faults. Using RAEs, we demonstrate that the reliability of two representative microarchitecture structures, load/store queue and reorder buffer in an out-of-order processor, is improved by average factors of 1.3 and 1.95, respectively.
  • Keywords
    fault diagnosis; fault tolerance; integrated circuit reliability; microprocessor chips; RAEs; fault classification; first-order design constraint; intermittent fault tolerance; load-store queue; microprocessor array structures; nanoscale technology; nontransient faults; out-of-order processor; permanent faults; reliability-aware exceptions; reorder buffer; representative microarchitecture structures; wearout phenomena; Arrays; Circuit faults; Computational modeling; Microprocessors; Radiation detectors; Reliability; Transient analysis; array strucutre; de-configuration; fault injection; intermittent fault;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014
  • Conference_Location
    Dresden
  • Type

    conf

  • DOI
    10.7873/DATE.2014.114
  • Filename
    6800315