• DocumentCode
    665563
  • Title

    Towards fast OS rejuvenation: An experimental evaluation of fast OS reboot techniques

  • Author

    Bovenzi, Antonio ; Alonso, J. Marcos ; Yamada, Hiroyoshi ; Russo, S. ; Trivedi, Kishor S.

  • fYear
    2013
  • fDate
    4-7 Nov. 2013
  • Firstpage
    61
  • Lastpage
    70
  • Abstract
    Continuous or high availability is a key requirement for many modern IT systems. Computer operating systems play an important role in IT systems availability. Due to the complexity of their architecture, they are prone to suffer failures due to several types of software faults. Software aging causes a nonnegligible fraction of these failures. It leads to an accumulation of errors with time, increasing the system failure rate. This phenomenon can be accompanied by performance degradation and eventually system hang or even crash. As a countermeasure, software rejuvenation entails stopping the system, cleaning its internal state, and resuming its operation. This process usually incurs downtime. For an operating system, the downtime impacts any application running on top of it. Several solutions have been developed to speed up the boot time of operating systems in order to reduce the downtime overhead. We present a study of two fast OS reboot techniques for rejuvenation of Linux-based operating systems, namely Kexec and Phase-based reboot. The study measures the performance penalty they introduce and the gain in reduction of downtime overhead. The results reveal that the Kexec and Phase-based reboot have no statistically significant impact in terms of performance penalty from the user perspective. However, they may require extra resource (e.g., CPU) usage. The downtime overhead reduction, compared with normal Linux and VM reboots, is 77% and 79% in Kexec and Phase-based reboot, respectively.
  • Keywords
    Linux; software architecture; software metrics; software performance evaluation; system recovery; virtual machines; IT systems availability; Kexec reboot; Linux-based operating systems; VM reboots; architecture complexity; computer operating systems; downtime overhead reduction; fast OS reboot techniques; fast OS rejuvenation; performance penalty; phase-based reboot; software aging; software rejuvenation; system failure rate; Aging; Hardware; Image restoration; Kernel; Linux;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on
  • Conference_Location
    Pasadena, CA
  • Type

    conf

  • DOI
    10.1109/ISSRE.2013.6698905
  • Filename
    6698905