• DocumentCode
    2719766
  • Title

    Design and validation of portable communication infrastructure for fault-tolerant cluster middleware

  • Author

    Li, Ming ; Tao, Wenchao ; Goldberg, Daniel ; Hsu, Israel ; Tamir, Yuval

  • Author_Institution
    Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    266
  • Lastpage
    274
  • Abstract
    We describe the communication infrastructure (CI) for our fault-tolerant cluster middleware, which is optimized for two classes of communication: for the applications and for the cluster management middleware. This CI was designed for portability and for efficient operation on top of modern user-level message passing mechanisms. We present a functional fault model for the CI and show how platform-specific faults map to this fault model. Based on this fault model, we have developed a fault injection scheme that is integrated with the CI and is thus portable across different communication technologies. We have used fault injection to validate and evaluate the implementation of the CI itself as well as the cluster management middleware in the presence of communication faults.
  • Keywords
    fault tolerant computing; middleware; resource allocation; software portability; workstation clusters; cluster management; communication infrastructure; fault injection; fault-tolerant cluster middleware; functional fault model; message passing; middleware; portability; Application software; Communications technology; Coordinate measuring machines; Fault tolerance; Fault tolerant systems; Laboratories; Message passing; Middleware; Operating systems; Resource management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing, 2002. Proceedings. 2002 IEEE International Conference on
  • Print_ISBN
    0-7695-2066-9
  • Type

    conf

  • DOI
    10.1109/CLUSTR.2002.1137755
  • Filename
    1137755