Title :
Soft error assessments for servers
Author :
Muller, K. Paul ; Sanda, Pia N.
Author_Institution :
Syst. & Technol. Group, Int. Bus. Machines Corp., Poughkeepsie, NY, USA
Abstract :
In order to assess the soft error rate (SER) of a server, it is important to not only quantify the soft error contribution of the individual semiconductor components, but also to account for derating and for SER mitigation like hardening and shielding. Derating describes the fact that not every soft error has an impact. A large number of soft errors vanish based on electrical, logical or timing considerations. They have no impact. Additionally, a server can, to a large degree, be protected from the impact of soft errors by implementing error detection and correction means. In these cases the impact of the soft error is limited to the extra compute time needed for the correction. Summing up the SER contributions from transistors and circuits results in the so-called raw soft error rate, a rate which describes just the bottom layer of the system stack. Powerful protection mechanisms at higher layers can reduce that rate by several orders of magnitude. Awareness of this vertical interaction across the different layers in the system stack leads to servers optimized for robustness.
Keywords :
error analysis; mainframes; monolithic integrated circuits; radiation hardening (electronics); transistors; SER mitigation; error correction; error detection; individual semiconductor components; server; soft error rate; transistors; Charge carrier processes; Circuits; Error analysis; Error correction; Hardware; Microprogramming; Power system protection; Robustness; Single event transient; Switches; SER protection; cross-layer optimization; derating; server; soft error rates; system assessments;
Conference_Titel :
Reliability Physics Symposium (IRPS), 2010 IEEE International
Conference_Location :
Anaheim, CA
Print_ISBN :
978-1-4244-5430-3
DOI :
10.1109/IRPS.2010.5488799