DocumentCode :
2013872
Title :
Transient Fault Tolerance on Chip Multiprocessor Based on Dual and Triple Core Redundancy
Author :
Gong, Rui ; Dai, Kui ; Wang, Zhiying
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
fYear :
2008
fDate :
15-17 Dec. 2008
Firstpage :
273
Lastpage :
280
Abstract :
To address the increasing susceptibility of microprocessors to transient faults, many techniques have been proposed to exploit the core redundancy of chip multiprocessors (CMPs). But the inter-core communications become critical in these core redundancy based techniques. To reduce the inter-core communication bandwidth demand, two new approaches, dual core redundancy (DCR) and triple core redundancy (TCR), are proposed for fault tolerance in this paper. In DCR, only store instructions are compared before commit, so that the bandwidth demand can be largely reduced. And the fault recovery is achieved by context saving and recovery. While TCR applies triple modular redundancy (TMR) in the core level to efficiently exploit the core resources of CMPs for transient fault masking. In TCR, only the results of store instructions are compared to detect transient fault and reduce the inter-core communication bandwidth demand. Once detecting a single event upset (SEU), TCR can be reconfigured to execute with the two uncorrupted cores for fault detection.The experimental results demonstrate that compared to traditional transient fault recovery scheme CRTR, both DCR and TCR efficiently reduce inter-core bandwidth demand. DCR achieves transient fault recovery with reasonable performance overhead caused by context saving. TCR occupies more core resources and has the lowest performance overhead during normal execution.
Keywords :
fault tolerance; microprocessor chips; redundancy; Transient fault tolerance; chip multiprocessors; dual core redundancy; fault detection; intercore communications; triple core redundancy; triple modular redundancy; Bandwidth; Cathode ray tubes; Context; Event detection; Fault detection; Fault tolerance; Microprocessors; Redundancy; Single event upset; Yarn; Chip Multiprocessor; Dual Core Redundancy; Transient Fault Tolerance; Triple Core Redundancy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Computing, 2008. PRDC '08. 14th IEEE Pacific Rim International Symposium on
Conference_Location :
Taipei
Print_ISBN :
978-0-7695-3448-0
Electronic_ISBN :
978-0-7695-3448-0
Type :
conf
DOI :
10.1109/PRDC.2008.40
Filename :
4725306
Link To Document :
بازگشت