• DocumentCode
    2364647
  • Title

    Determination of an optimal retry time in multiple-module computing systems

  • Author

    Hou, Chao-Ju ; Shin, Kang G.

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Michigan Univ., Ann Arbor, MI, USA
  • fYear
    1993
  • fDate
    25-28 Apr 1993
  • Firstpage
    294
  • Lastpage
    301
  • Abstract
    The optimal amount of time used for retrying an instruction on detection of an error in a computing system is usually determined under the assumption that the system is composed of a single module, within which all fault activities are confined until some module-replacement action is taken. The authors consider fault activities in multiple-module systems. They first relax the single-module assumption and model the fault activities in a multiple-module system as a Markov process. The randomization approach is applied to decompose the Markov process into a discrete-time Markov chain subordinated to a Poisson process. Using this decomposition, several interesting measures can be derived such as the conditional probability of successful retry given a retry period and the fact that a non-permanent fault has occurred, and the mean time until which all modules in the system enter a fault-free state. All the measures derived are used to determine, along with the parameters characterizing fault activities and costs of recovery techniques, whether or not retry should be used as a first-step recovery means on detection of an error; and the best retry period subject to a specific probability of successful retry
  • Keywords
    Markov processes; fault tolerant computing; optimisation; performance evaluation; probability; reliability; stochastic processes; system recovery; Markov process; Poisson process; conditional probability; discrete-time Markov chain; error detection; fault activities; fault-free state; module-replacement action; multiple-module computing systems; optimal retry time; randomization approach; single-module assumption; system recovery; Chaos; Computer aided instruction; Computer errors; Computer science; Electrical fault detection; Laboratories; Markov processes; Position measurement; Real time systems; Time measurement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Uncertainty Modeling and Analysis, 1993. Proceedings., Second International Symposium on
  • Conference_Location
    College Park, MD
  • Print_ISBN
    0-8186-3850-8
  • Type

    conf

  • DOI
    10.1109/ISUMA.1993.366753
  • Filename
    366753