DocumentCode :
3208812
Title :
Opportunistic transient-fault detection
Author :
Gomaa, Mohamed A. ; Vijaykumar, T.N.
Author_Institution :
Sch. of Electr. & Comput. Eng., Purdue Univ., USA
fYear :
2005
fDate :
4-8 June 2005
Firstpage :
172
Lastpage :
183
Abstract :
CMOS scaling increases susceptibility of microprocessors to transient faults. Most current proposals for transient-fault detection use full redundancy to achieve perfect coverage while incurring significant performance degradation. However, most commodity systems do not need or provide perfect coverage. A recent paper explores this leniency to reduce the soft-error rate of the issue queue during L2 misses while incurring minimal performance degradation. Whereas the previous paper reduces soft-error rate without using any redundancy, we target better coverage while incurring similarly-minimal performance degradation by opportunistically using redundancy. We propose two semi-complementary techniques, called partial explicit redundancy (PER) and implicit redundancy through reuse (IRTR), to explore the trade-off between soft-error rate and performance. PER opportunistically exploits low-ILP phases and L2 misses to introduce explicit redundancy with minimal performance degradation. Because PER covers the entire pipeline and exploits not only L2 misses but all low-ILP phases, PER achieves better coverage than the previous work. To achieve coverage in high-ILP phases as well, we propose implicit redundancy through reuse (IRTR). Previous work exploits the phenomenon of instruction reuse to avoid redundant execution while falling back on redundant execution when there is no reuse. IRTR takes reuse to the extreme of performance-coverage trade-off and completely avoids explicit redundancy by exploiting reuse´s implicit redundancy within the main thread for fault detection with virtually no performance degradation. Using simulations with SPEC2000, we show that PER and IRTR achieve better tradeoff between soft-error rate and performance degradation than the previous schemes.
Keywords :
CMOS integrated circuits; fault tolerance; microprocessor chips; redundancy; CMOS scaling; ILP phase; SPEC2000; implicit redundancy through reuse; instruction reuse; partial explicit redundancy; performance degradation; performance-coverage trade-off; reuse implicit redundancy; soft-error rate; transient-fault detection; Degradation; Fault detection; Microprocessors; Packaging; Pipelines; Power system reliability; Proposals; Redundancy; Voltage; Yarn;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture, 2005. ISCA '05. Proceedings. 32nd International Symposium on
ISSN :
1063-6897
Print_ISBN :
0-7695-2270-X
Type :
conf
DOI :
10.1109/ISCA.2005.38
Filename :
1431555
Link To Document :
بازگشت