• DocumentCode
    2955970
  • Title

    Virtualization aware job schedulers for checkpoint-restart

  • Author

    Badrinath, R. ; Krishnakumar, R. ; Rajan, R. K Palanivel

  • Author_Institution
    Hewlett-Packard Co., Palo Alto, CA
  • Volume
    2
  • fYear
    2007
  • fDate
    5-7 Dec. 2007
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    Application checkpoint and restart has been a widely studied problem over the last several decades. Despite immense volume of theory and several research project level implementations, there is very little by way of working solutions for the case of parallel distributed applications (such as MPI programs on a cluster). We describe our experiences in enhancing a job scheduler to leverage mechanisms of a virtual machine environment to support checkpoint-restart. We also describe the basic coordinated checkpoint-restart framework that we implemented on which this solution is based.
  • Keywords
    checkpointing; message passing; scheduling; virtual machines; checkpoint-restart framework; parallel distributed application; virtual machine environment; virtualization aware job scheduler; Application virtualization; Checkpointing; Chromium; Kernel; Libraries; Linux; Middleware; Operating systems; Processor scheduling; Virtual machining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems, 2007 International Conference on
  • Conference_Location
    Hsinchu
  • ISSN
    1521-9097
  • Print_ISBN
    978-1-4244-1889-3
  • Electronic_ISBN
    1521-9097
  • Type

    conf

  • DOI
    10.1109/ICPADS.2007.4447844
  • Filename
    4447844