• DocumentCode
    1244092
  • Title

    IBM z990 soft error detection and recovery

  • Author

    Meaney, Patrick J. ; Swaney, Scott B. ; Sanda, Pia N. ; Spainhower, Lisa

  • Author_Institution
    Technol. Group, IBM Syst., Poughkeepsie, NY, USA
  • Volume
    5
  • Issue
    3
  • fYear
    2005
  • Firstpage
    419
  • Lastpage
    427
  • Abstract
    Soft errors in logic are becoming more significant in the design of computer systems due to increased sensitivities of latches and combinatorial logic and the increased number of transistors on a chip. At the same time, users of computer systems continue to expect higher levels of system reliability. Therefore, the investment in hardware and firmware software mitigation is likely to continue to rise. The IBM eServer z990 system is designed to detect and recover from myriad instances of soft and permanent errors. The error detection and recovery within the z990 processors and the "nest" chips is described with respect to the system level protection against soft errors.
  • Keywords
    error correction codes; error detection codes; integrated circuit reliability; microprocessor chips; multichip modules; system recovery; IBM eServer z990 system; combinatorial logic; computer systems; error correcting code; latches sensitivities; permanent errors; single event upset; soft error detection; soft error rate; soft error recovery; software mitigation; system level protection; system reliability; z990 processors; CMOS technology; Circuits; Computer errors; Error correction codes; Hardware; Latches; Logic design; Microprogramming; Protection; Reliability; Error-correcting code (ECC); error detection; recovery; reliability, availability, and serviceability (RAS); single-event upset (SEU); soft error rate (SER);
  • fLanguage
    English
  • Journal_Title
    Device and Materials Reliability, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1530-4388
  • Type

    jour

  • DOI
    10.1109/TDMR.2005.859577
  • Filename
    1545901