DocumentCode :
1949801
Title :
FTXen: Making hypervisor resilient to hardware faults on relaxed cores
Author :
Xinxin Jin ; Soyeon Park ; Tianwei Sheng ; Rishan Chen ; Zhiyong Shan ; Yuanyuan Zhou
fYear :
2015
fDate :
7-11 Feb. 2015
Firstpage :
451
Lastpage :
462
Abstract :
As CMOS technology scales, the Increasingly smaller transistor components are susceptible to a variety of in-field hardware errors. Traditional redundancy techniques to deal with the increasing error rates are expensive and energy inefficient. To address this emerging challenge, many researchers have recently proposed the idea of relaxed hardware design and exposing errors to software. For such relaxed hardware to become a reality, it is crucially important for system software, such as the virtual machine hypervisor, to be resilient to hardware faults. To address the above fundamental software challenge in enabling relaxed hardware design, we are making a major effort in restructuring an important part of system software, namely the virtual machine hypervisor, to be resilient to faulty cores. A fault in a relaxed core can only affect those virtual machines (and applications) running on that core, but the hypervisor and other virtual machines remain intact and continue providing services. We have redesigned every component of Xen, a large, popular virtual machine hypervisor, to achieve such error resiliency. This paper presents our design and implementation of the restructured Xen (we refer to it as FTXen). Our experimental evaluation on real systems shows that FTXen adds minimum application overhead, and scales well to different ratios of reliable and relaxed cores. Our results with random fault injection show that FTXen can successfully survive all injected hardware faults.
Keywords :
fault tolerant computing; virtual machines; CMOS technology; FTXen; error resiliency; faulty cores; hardware faults; in-field hardware errors; random fault injection; relaxed cores; relaxed hardware design; system software; transistor components; virtual machine hypervisor; Data structures; Hardware; Reliability; System software; Virtual machine monitors; Virtual machining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on
Conference_Location :
Burlingame, CA
Type :
conf
DOI :
10.1109/HPCA.2015.7056054
Filename :
7056054
Link To Document :
بازگشت