• DocumentCode
    2992289
  • Title

    Timely Virtual Machine Migration for Pro-active Fault Tolerance

  • Author

    Polze, Andreas ; Troger, Peter ; Salfner, Felix

  • Author_Institution
    Oper. Syst. & Middleware Group, Univ. of Potsdam, Potsdam, Germany
  • fYear
    2011
  • fDate
    28-31 March 2011
  • Firstpage
    234
  • Lastpage
    243
  • Abstract
    Next generation processor and memory technologies will provide tremendously increasing computing and memory capacities for application scaling. However, this comes at a price: Due to the growing number of transistors and shrinking structural sizes, overall system reliability of future server systems is about to suffer significantly. This makes reactive fault tolerance schemes less appropriate for server applications under reliability and timeliness constraints. We propose an architectural blueprint for managing server system dependability in a pro-active fashion, in order to keep service-level promises for response times and availability even with increasing hardware failure rates. We introduce the concept of anticipatory virtual machine migration that proactively moves computation away from faulty or suspicious machines. The migration decision is based on health indicators at various system levels that are combined into a global probabilistic reliability measure. Based on this measure, live migration techniques can be triggered in order to move computation to healthy machines even before a failure brings the system down.
  • Keywords
    fault tolerant computing; virtual machines; architectural blueprint; global probabilistic reliability; health indicator; healthy machine; memory technology; migration decision; next generation processor; proactive fault tolerance; reactive fault tolerance; server system management; system reliability; timely virtual machine migration; Fault tolerance; Fault tolerant systems; Hardware; Monitoring; Operating systems; Servers; Virtual machining; failure prediction; live migration; meta-learning; monitoring; virtualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Object/Component/Service-Oriented Real-Time Distributed Computing Workshops (ISORCW), 2011 14th IEEE International Symposium on
  • Conference_Location
    Newport Beach, CA
  • Print_ISBN
    978-1-4577-0303-4
  • Electronic_ISBN
    978-0-7695-4377-2
  • Type

    conf

  • DOI
    10.1109/ISORCW.2011.42
  • Filename
    5753533