• DocumentCode
    1966107
  • Title

    Soft error propagation in floating-point programs

  • Author

    Li, Sha ; Li, Xiaoming

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Delaware, Newark, DE, USA
  • fYear
    2010
  • fDate
    9-11 Dec. 2010
  • Firstpage
    239
  • Lastpage
    246
  • Abstract
    As technology scales, VLSI performance has experienced an exponential growth. As feature sizes shrink, however, we will face new challenges such as soft errors (single-event upsets) to maintain the reliability of circuits. Recent studies have tried to address soft errors with error detection and correction techniques such as error correcting codes and redundant execution. However, these techniques come at a cost of additional storage or lower performance. In this paper, we present a different approach to address soft errors. We start from building a quantitative understanding of the error propagation in software and propose a systematic evaluation of the impact of bit flip caused by soft errors on floating-point operations. Furthermore, we introduce a novel model to deal with soft errors. More specifically, we assume soft errors have occurred in memory and try to know how the errors will manifest in the results of programs. Therefore, some soft errors can be tolerated if the error in results is smaller than the intrinsic inaccuracy of floating-point representations or within a predefined range. We focus on analyzing error propagation for floating-point arithmetic operations. Our approach is motivated by interval analysis. We model the rounding effect of floating-point numbers, which enable us to simulate and predict the error propagation for single floating-point arithmetic operations for specific soft errors. In other words, we model and simulate the relation between the bit flip rate, which is determined by soft errors in hardware, and the error of floating-point arithmetic operations. The simulation results enable us to tolerate certain types of soft errors without expensive error detection and correction processing.
  • Keywords
    error correction codes; floating point arithmetic; software engineering; VLSI performance; circuit reliability; error correcting codes; floating-point arithmetic operation; floating-point representations; redundant execution technique; soft error propagation; Analytical models; Computational modeling; Computers; Error correction codes; Error probability; Fuses; Predictive models;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Performance Computing and Communications Conference (IPCCC), 2010 IEEE 29th International
  • Conference_Location
    Albuquerque, NM
  • ISSN
    1097-2641
  • Print_ISBN
    978-1-4244-9330-2
  • Type

    conf

  • DOI
    10.1109/PCCC.2010.5682305
  • Filename
    5682305