• DocumentCode
    2321887
  • Title

    A Fault Tolerance Framework for High Performance Computing in Cloud

  • Author

    Egwutuoha, Ifeanyi P. ; Shiping Chen ; Levy, David ; Selic, Bran

  • Author_Institution
    Sch. of Electr. & Inf. Eng., Univ. of Sydney, Sydney, NSW, Australia
  • fYear
    2012
  • fDate
    13-16 May 2012
  • Firstpage
    709
  • Lastpage
    710
  • Abstract
    Cloud computing offers new capacity and flexibility solution to high performance computing (HPC) applications with provisioning of a large number of virtual machines for computational intensive applications. Fault tolerance allows HPC systems on cloud with multiple of nodes to complete execution of computational intensive applications in the present of fault. The most commonly used fault tolerance techniques for HPC is checkpoint/restart. However, checkpoint/restart increases the wall clock time of the execution of applications which increases the execution cost. In this paper we present a fault tolerance framework for high performance computing in Cloud. This framework proposes using process level redundancy (PLR) techniques to reduce the wall clock time of the execution of computational intensive applications.
  • Keywords
    checkpointing; cloud computing; fault tolerant computing; virtual machines; checkpoint/restart; cloud computing; computational intensive application; fault tolerance framework; flexibility solution; high performance computing; process level redundancy; virtual machines; wall clock time; Checkpointing; Circuit faults; Cloud computing; Fault tolerance; Fault tolerant systems; High performance computing; Program processors; Cloud Computing; Computational Intensive Applications; Fault tolerance; High Performance Computing (HPC); Process Level Redundant (PLR); fault tolerance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on
  • Conference_Location
    Ottawa, ON
  • Print_ISBN
    978-1-4673-1395-7
  • Type

    conf

  • DOI
    10.1109/CCGrid.2012.80
  • Filename
    6217495