DocumentCode
3207336
Title
Supporting highly-decoupled thread-level redundancy for parallel programs
Author
Rashid, M. Wasiur ; Huang, Michael C.
Author_Institution
Dept. of Electr.&Comput. Eng., Univ. of Rochester, Rochester, RI
fYear
2008
fDate
16-20 Feb. 2008
Firstpage
393
Lastpage
404
Abstract
The continued scaling of device dimensions and the operating voltage reduces the critical charge and thus natural noise tolerance level of transistors. As a result, circuits can produce transient upsets that corrupt program execution and data. Redundant execution can detect and correct circuit errors on the fly. The increasing prevalence of multi-core architectures makes coarse-grain thread-level redundancy (TLR) very attractive. While TLR has been extensively studied in the context of single-threaded applications, much less attention is paid to the design issues and tradeoffs of supporting parallel codes. In this paper, we propose a microarchitecture to efficiently support TLR for parallel codes. One of the main design goals is to support a large number of unverified instructions, so that long latencies in verification can be easily tolerated. Another important objective is to have a comprehensive coverage that includes not only the computation logic but also the coherence and consistency logic in the memory subsystem. Hence, the redundant copy of the program needs to independently access the memory and the system needs to efficiently manage the non-determinism in parallel execution. The proposed architectural support to achieve these goals is entirely off the processor critical path and can be easily disabled when redundancy is not requested. The design, with a few effective optimizations, is also efficient in that during error-free execution, it causes less than 3% additional performance degradation on top of throughput loss due to redundancy.
Keywords
microprocessor chips; multi-threading; power aware computing; redundancy; transistors; coarse-grain thread-level redundancy; coherence logic; computation logic; consistency logic; natural noise tolerance level; parallel programs; thread-level redundancy; transistors; Circuit noise; Delay; Error correction; Logic; Memory management; Microarchitecture; Noise level; Noise reduction; Redundancy; Voltage;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on
Conference_Location
Salt Lake City, UT
ISSN
1530-0897
Print_ISBN
978-1-4244-2070-4
Type
conf
DOI
10.1109/HPCA.2008.4658655
Filename
4658655
Link To Document