• DocumentCode
    896918
  • Title

    Resource allocation for primary-site fault-tolerant systems

  • Author

    Huang, Yennun ; Tripathi, Satish K.

  • Author_Institution
    AT&T Bell Labs, Murray Hill, NJ, USA
  • Volume
    19
  • Issue
    2
  • fYear
    1993
  • fDate
    2/1/1993 12:00:00 AM
  • Firstpage
    108
  • Lastpage
    119
  • Abstract
    Resource allocation for a distributed system employing the primary site approach for fault tolerance is discussed. Two kinds of systems are considered. The first consists of fault-tolerant nodes where each node has many duplicated servers. One server is the primary, which serves user requests, and the rest are backup. The second does not have fault-tolerant nodes. To tolerate node failures, each node uses other nodes as backups. When a node fails, all requests initially allocated to the node are served by one of its backups. To study the resource allocation for such systems, an approximate model for each system is developed. Using these models, efficient allocation algorithms that take into account the failure/repair rates of the system and the fault-tolerant overheads are presented. Using experimental results, it is shown that the algorithms give the optimal or suboptimal allocations. The algorithms, which incur little overhead, can improve the system performance significantly over an intuitive allocation algorithm
  • Keywords
    distributed processing; fault tolerant computing; file servers; performance evaluation; resource allocation; approximate model; distributed system; node failures; primary-site fault-tolerant systems; resource allocation; server; system performance; Availability; Delay; Distributed computing; Fault tolerance; Fault tolerant systems; Helium; Real time systems; Resource management; System performance; Throughput;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/32.214829
  • Filename
    214829