• DocumentCode
    2439473
  • Title

    Improving the performance of hypervisor-based fault tolerance

  • Author

    Zhu, Jun ; Dong, Wei ; Jiang, Zhefu ; Shi, Xiaogang ; Xiao, Zhen ; Li, Xiaoming

  • Author_Institution
    Sch. of Electron. Eng. & Comput. Sci., Peking Univ., Beijing, China
  • fYear
    2010
  • fDate
    19-23 April 2010
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    Hypervisor-based fault tolerance (HBFT), a checkpoint-recovery mechanism, is an emerging approach to sustaining mission-critical applications. Based on virtualization technology, HBFT provides an economic and transparent solution. However, the advantages currently come at the cost of substantial overhead during failure-free, especially for memory intensive applications. This paper presents an in-depth examination of HBFT and options to improve its performance. Based on the behavior of memory accesses among checkpointing epochs, we introduce two optimizations, read fault reduction and write fault prediction, for the memory tracking mechanism. These two optimizations improve the mechanism by 31.1% and 21.4% respectively for some application. Then, we present software-superpage which efficiently maps large memory regions between virtual machines (VM). By the above optimizations, HBFT is improved by a factor of 1.4 to 2.2 and it achieves a performance which is about 60% of that of the native VM.
  • Keywords
    checkpointing; fault tolerant computing; optimisation; virtual machines; checkpoint-recovery mechanism; hypervisor-based fault tolerance; memory intensive applications; memory tracking; mission-critical applications; optimization; read fault reduction; software superpage; virtual machines; virtualization; write fault prediction; Application software; Checkpointing; Fault tolerance; Fault tolerant systems; Maintenance; Mission critical systems; Read-write memory; Virtual machine monitors; Virtual machining; Virtual manufacturing; Checkpoint; Fault Tolerance; Hypervisor; Recovery; Virtualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4244-6442-5
  • Type

    conf

  • DOI
    10.1109/IPDPS.2010.5470357
  • Filename
    5470357