• DocumentCode
    1816136
  • Title

    A Resource Management System for Fault Tolerance in Grid Computing

  • Author

    Lee, HwaMin ; Park, DooSoon ; Hong, Min ; Yeo, Sang-Soo ; Kim, SooKyun ; Kim, SungHoon

  • Author_Institution
    Div. of Comput. Sci. & Eng., Soonchunhyang Univ., Asan, South Korea
  • Volume
    2
  • fYear
    2009
  • fDate
    29-31 Aug. 2009
  • Firstpage
    609
  • Lastpage
    614
  • Abstract
    In grid computing, resource management and fault tolerance services are important issues. The availability of the selected resources for job execution is a primary factor that determines the computing performance. The failure occurrence of resources in the grid computing is higher than in a tradition parallel computing. Since the failure of resources affects job execution fatally, fault tolerance service is essential in computational grids. And grid services are often expected to meet some minimum levels of quality of service (QoS) for desirable operation. However Globus toolkit does not provide fault tolerance service that supports fault detection service and management service and satisfies QoS requirement. Thus this paper proposes fault tolerance service to satisfy QoS requirement in computational grids. In order to provide fault tolerance service and satisfy QoS requirements, we expand the definition of failure, such as process failure, processor failure, and network failure. And we propose resource scheduling service, fault detection service and fault management service and show implement and experiment results.
  • Keywords
    fault tolerant computing; grid computing; middleware; quality of service; software development management; Globus toolkit; fault detection service; fault management service; fault tolerance service; grid computing; quality of service; resource management system; resource scheduling service; Computerized monitoring; Condition monitoring; Fault detection; Fault tolerance; Fault tolerant systems; Grid computing; Memory management; Middleware; Quality of service; Resource management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Engineering, 2009. CSE '09. International Conference on
  • Conference_Location
    Vancouver, BC
  • Print_ISBN
    978-1-4244-5334-4
  • Electronic_ISBN
    978-0-7695-3823-5
  • Type

    conf

  • DOI
    10.1109/CSE.2009.257
  • Filename
    5283868