DocumentCode
523599
Title
Verification for fault tolerance of the IBM system z microprocessor
Author
Thompto, Brian W. ; Hoppe, Bodo
Author_Institution
Syst. & Technol. Group, IBM, Austin, TX, USA
fYear
2010
fDate
13-18 June 2010
Firstpage
525
Lastpage
530
Abstract
IBM System z∗ processors are known for their industry leading Reliability, Availability and Serviceability (RAS). The hardware is designed to support a high resilience against errors and the ability to recover from errors maintaining a valid architectural state. This paper describes the thorough verification effort required to prove that the fault tolerance of the IBM System z processor core matches the high expectations prior to design tape-out. This paper proposes a multifaceted verification methodology to cover the various aspects of verifying correct error detection, isolation and recovery. Soft errors enlarge the state space of a design significantly. This provides a significant challenge to the functional verification environment in order to tolerate the fails and to expect architectural compliance. Several fault injection mechanisms are discussed. A special focus is on the novel methodology of Comprehensive Fault Injection (CFI) used to validate and improve the dependability characteristics of the processor core, providing improved Soft Error Resilience (SER). Feedback of the results and measurements of the efficiency and functional coverage are an integral part of the overall methodology, allowing the smart use of the available compute resources.
Keywords
Decision support systems; Fault tolerant systems; Microprocessors; CFI; RAS; SER; error detection; error recovery; fault injection;
fLanguage
English
Publisher
ieee
Conference_Titel
Design Automation Conference (DAC), 2010 47th ACM/IEEE
Conference_Location
Anaheim, CA, USA
ISSN
0738-100X
Print_ISBN
978-1-4244-6677-1
Type
conf
Filename
5522658
Link To Document