• DocumentCode
    2624630
  • Title

    Job failure prediction in grid environment based on workload characteristics

  • Author

    Fadishei, Hamid ; Saadatfar, Hamid ; Deldari, Hossein

  • Author_Institution
    Parallel & Distrib. Process. Lab., Ferdowsi Univ. of Mashhad, Mashhad, Iran
  • fYear
    2009
  • fDate
    20-21 Oct. 2009
  • Firstpage
    329
  • Lastpage
    334
  • Abstract
    The power of grid technology in aggregating autonomous resources owned by several organizations into a single virtual system has made it popular in compute-intensive and data-intensive applications. Complex and dynamic nature of grid makes failure of users´ jobs fairly probable. Furthermore, traditional methods for job failure recovery have proven costly and thus a need to shift toward proactive and predictive management strategies is necessary in such systems. In this paper, an innovative effort is made to predict the futurity of jobs submitted to a production grid environment (AuverGrid). By analyzing grid workload traces and extracting patterns describing common failure characteristics, the success or failure status of jobs during 6 months of AuverGrid activity was predicted with around 96% accuracy. The quality of services on grid can be improved by integrating the result of this work into management services like scheduling and monitoring.
  • Keywords
    grid computing; learning (artificial intelligence); middleware; AuverGrid environment; compute-intensive application; data-intensive application; job failure prediction; production grid environment; workload characteristic; Computer applications; Concurrent computing; Condition monitoring; Distributed computing; Distributed processing; Failure analysis; Grid computing; Large-scale systems; Machine learning; Power system management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Conference, 2009. CSICC 2009. 14th International CSI
  • Conference_Location
    Tehran
  • Print_ISBN
    978-1-4244-4261-4
  • Electronic_ISBN
    978-1-4244-4262-1
  • Type

    conf

  • DOI
    10.1109/CSICC.2009.5349381
  • Filename
    5349381