DocumentCode
1966107
Title
Soft error propagation in floating-point programs
Author
Li, Sha ; Li, Xiaoming
Author_Institution
Dept. of Electr. & Comput. Eng., Univ. of Delaware, Newark, DE, USA
fYear
2010
fDate
9-11 Dec. 2010
Firstpage
239
Lastpage
246
Abstract
As technology scales, VLSI performance has experienced an exponential growth. As feature sizes shrink, however, we will face new challenges such as soft errors (single-event upsets) to maintain the reliability of circuits. Recent studies have tried to address soft errors with error detection and correction techniques such as error correcting codes and redundant execution. However, these techniques come at a cost of additional storage or lower performance. In this paper, we present a different approach to address soft errors. We start from building a quantitative understanding of the error propagation in software and propose a systematic evaluation of the impact of bit flip caused by soft errors on floating-point operations. Furthermore, we introduce a novel model to deal with soft errors. More specifically, we assume soft errors have occurred in memory and try to know how the errors will manifest in the results of programs. Therefore, some soft errors can be tolerated if the error in results is smaller than the intrinsic inaccuracy of floating-point representations or within a predefined range. We focus on analyzing error propagation for floating-point arithmetic operations. Our approach is motivated by interval analysis. We model the rounding effect of floating-point numbers, which enable us to simulate and predict the error propagation for single floating-point arithmetic operations for specific soft errors. In other words, we model and simulate the relation between the bit flip rate, which is determined by soft errors in hardware, and the error of floating-point arithmetic operations. The simulation results enable us to tolerate certain types of soft errors without expensive error detection and correction processing.
Keywords
error correction codes; floating point arithmetic; software engineering; VLSI performance; circuit reliability; error correcting codes; floating-point arithmetic operation; floating-point representations; redundant execution technique; soft error propagation; Analytical models; Computational modeling; Computers; Error correction codes; Error probability; Fuses; Predictive models;
fLanguage
English
Publisher
ieee
Conference_Titel
Performance Computing and Communications Conference (IPCCC), 2010 IEEE 29th International
Conference_Location
Albuquerque, NM
ISSN
1097-2641
Print_ISBN
978-1-4244-9330-2
Type
conf
DOI
10.1109/PCCC.2010.5682305
Filename
5682305
Link To Document