DocumentCode :
2318318
Title :
Hierarchical approach to accurate fault modeling for system evaluation
Author :
Kalbarczyk, Z. ; Ries, G. ; Lee, M.S. ; Xiao, Y. ; Patel, J. ; Iyer, R.K.
Author_Institution :
Center for Reliable & High Performance Comput., Illinois Univ., Urbana, IL, USA
fYear :
1998
fDate :
7-9 Sep 1998
Firstpage :
249
Lastpage :
258
Abstract :
This paper presents a hierarchical simulation methodology that enables accurate system evaluation under realistic faults and conditions. In this methodology effects of low-level (i.e., transistor or circuit levels) faults are propagated to higher levels (i.e., system level) using fault dictionaries. The primary fault model is obtained via simulation of the transistor-level effect of a radiation particle penetrating the device. The resulting current burst is used as a fault model in the circuit-level simulation and is injected into the nodes of a circuit/subcircuit. The latched outputs are collected in a fault dictionary and applied in conducting fault injection at the chip level under a selected workload. Faults injected at the chip level result in memory corruption, which is used as a fault model in the system-level simulation. When an application terminates, either normally or abnormally, the overall fault impact on the software behavior is quantified and analyzed. The simulation method is demonstrated and validated in the case study of Myrinet, a commercial, high-speed network. The study shows that the proposed approach offers a high confidence in the evaluation results, as the system is analyzed in presence of realistic fault conditions. It also demonstrates that the conducted analysis can be used to improve system dependability by identifying recovery mechanisms for failures observed during the experiments
Keywords :
circuit analysis computing; fault tolerant computing; performance evaluation; Myrinet; accurate fault modeling; circuit-level simulation; fault dictionaries; hierarchical approach; hierarchical simulation methodology; high-speed network; latched outputs; primary fault model; radiation particle; recovery mechanisms; simulation method; system dependability; system evaluation; system-level simulation; transistor-level effect; Analytical models; Application software; Circuit faults; Circuit simulation; Computational modeling; Dictionaries; Failure analysis; Fault diagnosis; High-speed networks; Voltage;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Performance and Dependability Symposium, 1998. IPDS '98. Proceedings. IEEE International
Conference_Location :
Durham, NC
ISSN :
1087-2191
Print_ISBN :
0-8186-8679-0
Type :
conf
DOI :
10.1109/IPDS.1998.707727
Filename :
707727
Link To Document :
بازگشت