• DocumentCode
    1662897
  • Title

    Recovery schemes for high availability and high performance distributed real-time computing

  • Author

    Lundberg, Lars ; Häggander, Daniel ; Klonowska, Kamilla ; Svahnberg, Charlie

  • Author_Institution
    Dept. of Software Eng. & Comput. Sci., Blekinge Inst. of Technol., Ronneby, Sweden
  • fYear
    2003
  • Abstract
    Clusters and distributed systems offer fault tolerance and high performance through load sharing, and are thus attractive in real-time applications. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers-fail the must be redistributed. The redistribution is determined by the recovery scheme. The recovery scheme should keep the load as evenly distributed as possible even when the most unfavorable combinations of computers break down, i.e. we want to optimize the worst-case behavior. In this paper we define recovery schemes, which are optimal for a number of important cases. We also show that the problem of finding optimal recovery schemes corresponds to the mathematical problem of finding sequences of integers with minimal sum and for which all sums of subsequences are unique.
  • Keywords
    fault tolerant computing; real-time systems; resource allocation; system recovery; clusters; distributed systems; fault tolerance; high performance distributed real-time computing; load sharing; mathematical problem; recovery schemes; worst-case behavior; Application software; Availability; Computer applications; Computer science; Concurrent computing; Distributed computing; Fault tolerant systems; High performance computing; Real time systems; Software engineering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2003. Proceedings. International
  • ISSN
    1530-2075
  • Print_ISBN
    0-7695-1926-1
  • Type

    conf

  • DOI
    10.1109/IPDPS.2003.1213241
  • Filename
    1213241