Title :
A Low-Cost Fault Tolerance Technique in Multi-media Applications through Configurability
Author :
Lanfang Tan ; Ying Tan
Author_Institution :
Nat. Lab. of Parallel & Distrib. Process., Changsha, China
Abstract :
As chip densities and clock rates increases, processors are becoming more susceptible to transient faults that affect program correctness. Therefore, fault tolerance becomes increasingly important in computing system. Two major concerns of fault tolerance techniques are: a) improving system reliability by detecting transient errors and b) reducing performance overhead. In this study, we propose a configurable fault tolerance technique targeting both high reliability and low performance overhead for multi-media applications. The basic principle is applying different levels of fault tolerance configurability, which means that different degrees of fault tolerance are applied to different parts of the source codes in multi-media applications. First, a primary analysis is performed on the source code level to classify the critical statements. Second, a fault injection process combined with a statistical analysis is used to assure the partition with regards to a confidence degree. Finally, checksum-based fault tolerance and instruction duplication are applied to critical statements, while no fault tolerance mechanism is applied to non-critical parts. Performance experiment results demonstrate that our configurable fault tolerance technique can lead to significant performance gains compared with duplicating all instructions. The fault coverage of this scheme is also evaluated. Fault injection results show that about 90% of outputs are application-level correctness with just 20% of runtime overhead.
Keywords :
multimedia systems; software fault tolerance; statistical analysis; application-level correctness; checksum-based fault tolerance; confidence degree; fault injection process; fault tolerance configurability; instruction duplication; low-cost fault tolerance technique; multimedia applications; performance overhead reduction; primary analysis; source code level; statistical analysis; system reliability; transient error detection; Benchmark testing; Fault tolerance; Fault tolerant systems; Multimedia communication; Registers; Statistical analysis; application-level correctness; checksum; configurable fault tolerance; critical segments; multi-media applications;
Conference_Titel :
Quality Software (QSIC), 2013 13th International Conference on
Conference_Location :
Najing
DOI :
10.1109/QSIC.2013.25