• DocumentCode
    2915687
  • Title

    Distributed Fault Management for Computational Grids

  • Author

    Affaan, Muhammad ; Ansari, M.A.

  • Author_Institution
    Muhammad Ali Jinnah Univ., Islamabad
  • fYear
    2006
  • fDate
    Oct. 2006
  • Firstpage
    363
  • Lastpage
    368
  • Abstract
    Grid resources having heterogeneous architectures, being geographically distributed and interconnected via unreliable network media, are at the risk of failure. Grid environment consists of unreliable resources; therefore, fault tolerant mechanisms can not be ignored. Some scientific jobs require long commitments of grid resources whose failures may not be overlooked. We need a flexible management of these failures by considering the failure of fault manager itself. In this paper we propose the concept of distributed management of failures without engaging the resources for this particular task exclusively. Resources performing the fault management may also participate in serving the long running user jobs. Each sub-job of the main user job is inspected by an individual resource. In case of failure inspector resource takes over in place of inspected resource. Contributions of this paper are: elimination of single point of failure and proposed concept´s ability to be integrated with variety of grid middleware
  • Keywords
    distributed object management; grid computing; middleware; resource allocation; software fault tolerance; system recovery; computational grids; distributed fault management; distributed management; fault tolerance; grid middleware; grid resources; Application software; Computer architecture; Condition monitoring; Distributed computing; Fault tolerance; Grid computing; Middleware; Nonhomogeneous media; Resource management; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Grid and Cooperative Computing, 2006. GCC 2006. Fifth International Conference
  • Conference_Location
    Hunan
  • Print_ISBN
    0-7695-2694-2
  • Type

    conf

  • DOI
    10.1109/GCC.2006.39
  • Filename
    4031482