• DocumentCode
    3208812
  • Title

    Opportunistic transient-fault detection

  • Author

    Gomaa, Mohamed A. ; Vijaykumar, T.N.

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Purdue Univ., USA
  • fYear
    2005
  • fDate
    4-8 June 2005
  • Firstpage
    172
  • Lastpage
    183
  • Abstract
    CMOS scaling increases susceptibility of microprocessors to transient faults. Most current proposals for transient-fault detection use full redundancy to achieve perfect coverage while incurring significant performance degradation. However, most commodity systems do not need or provide perfect coverage. A recent paper explores this leniency to reduce the soft-error rate of the issue queue during L2 misses while incurring minimal performance degradation. Whereas the previous paper reduces soft-error rate without using any redundancy, we target better coverage while incurring similarly-minimal performance degradation by opportunistically using redundancy. We propose two semi-complementary techniques, called partial explicit redundancy (PER) and implicit redundancy through reuse (IRTR), to explore the trade-off between soft-error rate and performance. PER opportunistically exploits low-ILP phases and L2 misses to introduce explicit redundancy with minimal performance degradation. Because PER covers the entire pipeline and exploits not only L2 misses but all low-ILP phases, PER achieves better coverage than the previous work. To achieve coverage in high-ILP phases as well, we propose implicit redundancy through reuse (IRTR). Previous work exploits the phenomenon of instruction reuse to avoid redundant execution while falling back on redundant execution when there is no reuse. IRTR takes reuse to the extreme of performance-coverage trade-off and completely avoids explicit redundancy by exploiting reuse´s implicit redundancy within the main thread for fault detection with virtually no performance degradation. Using simulations with SPEC2000, we show that PER and IRTR achieve better tradeoff between soft-error rate and performance degradation than the previous schemes.
  • Keywords
    CMOS integrated circuits; fault tolerance; microprocessor chips; redundancy; CMOS scaling; ILP phase; SPEC2000; implicit redundancy through reuse; instruction reuse; partial explicit redundancy; performance degradation; performance-coverage trade-off; reuse implicit redundancy; soft-error rate; transient-fault detection; Degradation; Fault detection; Microprocessors; Packaging; Pipelines; Power system reliability; Proposals; Redundancy; Voltage; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture, 2005. ISCA '05. Proceedings. 32nd International Symposium on
  • ISSN
    1063-6897
  • Print_ISBN
    0-7695-2270-X
  • Type

    conf

  • DOI
    10.1109/ISCA.2005.38
  • Filename
    1431555