• DocumentCode
    2072712
  • Title

    Dependability analysis of fault-tolerant multiprocessor systems by probabilistic simulation

  • Author

    Danilenko, Ivan ; Dmitrieva, Elena ; Tsapko, Gennadij

  • Author_Institution
    Tomsk Polytech. Univ., Russia
  • Volume
    1
  • fYear
    2001
  • fDate
    26 Jun-3 Jul 2001
  • Firstpage
    134
  • Abstract
    The objective of this research is to develop a new approach for evaluating the dependability of fault-tolerant computer systems. Dependability has traditionally been evaluated through combinatorial and Markov modelling. These analytical techniques have several limitations, which can restrict their applicability. Simulation avoids many of the limitations, allowing for more precise representation of system attributes than feasible with analytical modelling. However, the computational demands of simulating a system in detail, at a low abstraction level, currently prohibit evaluation of high-level dependability metrics such as reliability and availability. The new approach abstracts a system at the architectural level, and employs life testing through simulated fault-injection to accurately and efficiently measure dependability. The simulation models needed to implement this approach are derived, in part, from the published results of computer performance studies and low-level fault-injection experiments. The developed probabilistic models of processor, memory and fault-tolerant mechanisms take such properties of real systems, as error propagation, different modes of failures, event dependency and concurrency. They have been integrated with a workload model and statistical analysis module into a generalised software tool. The effectiveness of such an approach was demonstrated through the analysis of several multiprocessor architectures
  • Keywords
    fault tolerant computing; multiprocessing systems; performance evaluation; probability; statistical analysis; virtual machines; computer performance; error propagation; experiments; fault tolerant computer systems; fault-injection; life testing; multiprocessor systems; probabilistic models; simulation; statistical analysis; system dependability; workload model; Abstracts; Analytical models; Availability; Computational modeling; Computer performance; Computer simulation; Fault tolerant systems; Life testing; Mechanical factors; Multiprocessing systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Science and Technology, 2001. KORUS '01. Proceedings. The Fifth Russian-Korean International Symposium on
  • Conference_Location
    Tomsk
  • Print_ISBN
    0-7803-7008-2
  • Type

    conf

  • DOI
    10.1109/KORUS.2001.975079
  • Filename
    975079