DocumentCode :
3207336
Title :
Supporting highly-decoupled thread-level redundancy for parallel programs
Author :
Rashid, M. Wasiur ; Huang, Michael C.
Author_Institution :
Dept. of Electr.&Comput. Eng., Univ. of Rochester, Rochester, RI
fYear :
2008
fDate :
16-20 Feb. 2008
Firstpage :
393
Lastpage :
404
Abstract :
The continued scaling of device dimensions and the operating voltage reduces the critical charge and thus natural noise tolerance level of transistors. As a result, circuits can produce transient upsets that corrupt program execution and data. Redundant execution can detect and correct circuit errors on the fly. The increasing prevalence of multi-core architectures makes coarse-grain thread-level redundancy (TLR) very attractive. While TLR has been extensively studied in the context of single-threaded applications, much less attention is paid to the design issues and tradeoffs of supporting parallel codes. In this paper, we propose a microarchitecture to efficiently support TLR for parallel codes. One of the main design goals is to support a large number of unverified instructions, so that long latencies in verification can be easily tolerated. Another important objective is to have a comprehensive coverage that includes not only the computation logic but also the coherence and consistency logic in the memory subsystem. Hence, the redundant copy of the program needs to independently access the memory and the system needs to efficiently manage the non-determinism in parallel execution. The proposed architectural support to achieve these goals is entirely off the processor critical path and can be easily disabled when redundancy is not requested. The design, with a few effective optimizations, is also efficient in that during error-free execution, it causes less than 3% additional performance degradation on top of throughput loss due to redundancy.
Keywords :
microprocessor chips; multi-threading; power aware computing; redundancy; transistors; coarse-grain thread-level redundancy; coherence logic; computation logic; consistency logic; natural noise tolerance level; parallel programs; thread-level redundancy; transistors; Circuit noise; Delay; Error correction; Logic; Memory management; Microarchitecture; Noise level; Noise reduction; Redundancy; Voltage;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on
Conference_Location :
Salt Lake City, UT
ISSN :
1530-0897
Print_ISBN :
978-1-4244-2070-4
Type :
conf
DOI :
10.1109/HPCA.2008.4658655
Filename :
4658655
Link To Document :
بازگشت