DocumentCode :
1799889
Title :
Harnessing Soft Computations for Low-Budget Fault Tolerance
Author :
Khudia, Daya Shanker ; Mahlke, Scott
Author_Institution :
Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI, USA
fYear :
2014
fDate :
13-17 Dec. 2014
Firstpage :
319
Lastpage :
330
Abstract :
A growing number of applications from various domains such as multimedia, machine learning and computer vision are inherently fault tolerant. However, for these soft workloads, not all computations are fault tolerant (e.g., A loop trip count). In this paper, we propose a compiler-based approach that takes advantage of soft computations inherent in the aforementioned class of workloads to bring down the cost of software-only transient fault detection. The technique works by identifying a small subset of critical variables that are necessary for correct macro-operation of the program. Traditional duplication and comparison are used to protect these variables. For the remaining variables and temporaries that only affect the micro-operation of the program, strategic expected value checks are inserted into the code. Intuitively, a computation-chain result near the expected value is either correct or close enough to the correct result so that it does not matter for non-critical variables. Overall, the proposed solution has, on average, only 19.5% performance overhead and reduces the number of silent data corruptions from 15% down to 7.3% and user-visible silent data corruptions from 3.4% down to 1.2% in comparison to an unmodified application. This unacceptable silent data corruption rate is even lower than a traditional full duplication scheme that has, on average, 57% overhead.
Keywords :
program compilers; software fault tolerance; compiler-based approach; fault tolerant computations; full duplication scheme; low-budget fault tolerance; performance overhead; program macrooperation; soft computations; soft workloads; software-only transient fault detection; user-visible silent data corruptions; Benchmark testing; Circuit faults; Decoding; Fault tolerance; Fault tolerant systems; Multimedia communication; Optimization; Compiler Analysis; Soft Errors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on
Conference_Location :
Cambridge
ISSN :
1072-4451
Type :
conf
DOI :
10.1109/MICRO.2014.33
Filename :
7011398
Link To Document :
بازگشت