مرکز منطقه ای اطلاع رساني علوم و فناوري - Supporting highly-decoupled thread-level redundancy for parallel programs

DocumentCode :

3207336

Title :

Supporting highly-decoupled thread-level redundancy for parallel programs

Author :

Rashid, M. Wasiur ; Huang, Michael C.

Author_Institution :

Dept. of Electr.&Comput. Eng., Univ. of Rochester, Rochester, RI

fYear :

2008

fDate :

16-20 Feb. 2008

Firstpage :

393

Lastpage :

404

Abstract :

The continued scaling of device dimensions and the operating voltage reduces the critical charge and thus natural noise tolerance level of transistors. As a result, circuits can produce transient upsets that corrupt program execution and data. Redundant execution can detect and correct circuit errors on the fly. The increasing prevalence of multi-core architectures makes coarse-grain thread-level redundancy (TLR) very attractive. While TLR has been extensively studied in the context of single-threaded applications, much less attention is paid to the design issues and tradeoffs of supporting parallel codes. In this paper, we propose a microarchitecture to efficiently support TLR for parallel codes. One of the main design goals is to support a large number of unverified instructions, so that long latencies in verification can be easily tolerated. Another important objective is to have a comprehensive coverage that includes not only the computation logic but also the coherence and consistency logic in the memory subsystem. Hence, the redundant copy of the program needs to independently access the memory and the system needs to efficiently manage the non-determinism in parallel execution. The proposed architectural support to achieve these goals is entirely off the processor critical path and can be easily disabled when redundancy is not requested. The design, with a few effective optimizations, is also efficient in that during error-free execution, it causes less than 3% additional performance degradation on top of throughput loss due to redundancy.

Keywords :

microprocessor chips; multi-threading; power aware computing; redundancy; transistors; coarse-grain thread-level redundancy; coherence logic; computation logic; consistency logic; natural noise tolerance level; parallel programs; thread-level redundancy; transistors; Circuit noise; Delay; Error correction; Logic; Memory management; Microarchitecture; Noise level; Noise reduction; Redundancy; Voltage;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computer Architecture, 2008. HPCA 2008. IEEE 14th International Symposium on

Conference_Location :

Salt Lake City, UT

ISSN :

1530-0897

Print_ISBN :

978-1-4244-2070-4

Type :

conf

DOI :

10.1109/HPCA.2008.4658655

Filename :

4658655

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3207336