DocumentCode :
3025774
Title :
EnHTM: Exploiting Hardware Transaction Memory for Achieving Low-Cost Fault Tolerance
Author :
Jianli Li ; Qingping Tan ; Lanfang Tan
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
fYear :
2013
fDate :
29-30 June 2013
Firstpage :
550
Lastpage :
554
Abstract :
Fault-tolerance has become an essential concern for processor designers due to increasing transient fault rates, even for the processors used in the mainstream computing. As the mainstream commodity market accepts only low-cost fault tolerance solutions, traditional high-end solutions are unacceptable due to their expensive overheads. This paper presents EnHTM, a hybrid software/hardware implemented low-cost fault tolerance solution for the serial programs running on commodity systems. EnHTM employs light-weight symptom-based mechanism to detect faults and recovers from faults using a minimally-modified Hardware Transactional Memory (HTM) which features lazy conflict detection, lazy data versioning. Compile-time analysis approach is also exploited to support larger transaction size, so that transient faults detected within long latency can be recovered. The evaluation experiment result shows that EnHTM can recover from 89.4%of catastrophic failures caused by transient faults, with a performance overhead of 2.6% in error-free executions on average.
Keywords :
fault diagnosis; fault tolerant computing; program compilers; system recovery; transaction processing; EnHTM; catastrophic failure; commodity system; compile-time analysis approach; error-free execution; fault detection; fault recovery; hardware transaction memory; hybrid software-hardware implemented low-cost fault tolerance solution; lazy conflict detection; lazy data versioning; light-weight symptom-based mechanism; mainstream commodity market; minimally-modified hardware transactional memory; performance overhead; processor design; serial program; transaction size; transient fault rate; Automation; Manufacturing; Compile-time analysis; HTM; Symptom-based mechanism; Transient faults;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Manufacturing and Automation (ICDMA), 2013 Fourth International Conference on
Conference_Location :
Qingdao
Type :
conf
DOI :
10.1109/ICDMA.2013.130
Filename :
6598051
Link To Document :
بازگشت