• DocumentCode
    2670221
  • Title

    Snooze: A Scalable, Fault-Tolerant and Distributed Consolidation Manager for Large-Scale Clusters

  • Author

    Feller, Eugen ; Rilling, Louis ; Morin, Christine ; Lottiaux, Renaud ; Leprince, Daniel

  • Author_Institution
    INRIA Centre Rennes, Univ. de Beaulieu, Rennes, France
  • fYear
    2010
  • fDate
    18-20 Dec. 2010
  • Firstpage
    125
  • Lastpage
    132
  • Abstract
    Intelligent workload consolidation and dynamic cluster adaptation offer a great opportunity for energy savings in current large-scale clusters. Because of the heterogeneous nature of these environments, scalable, fault-tolerant and distributed consolidation managers are necessary in order to efficiently manage their workload and thus conserve energy and reduce the operating costs. However, most of the consolidation managers available nowadays do not fulfill these requirements. Hence, they are mostly centralized and solely designed to be operated in virtualized environments. In this work, we present the architecture of a novel scalable, fault-tolerant and distributed consolidation manager called Snooze that is able to dynamically consolidate the workload of a software and hardware heterogeneous large-scale cluster composed out of resources using the virtualization and Single System Image (SSI)technologies. Therefore, a common cluster monitoring and management API is introduced, which provides a uniform and transparent access to the features of the underlying platforms. Our architecture is open to support any future technologies and can be easily extended with monitoring metrics and algorithms. Finally, a comprehensive use case study demonstrates the feasibility of our approach to manage the energy consumption of a large-scale cluster.
  • Keywords
    fault tolerant computing; power aware computing; API; SSI; distributed consolidation manager; energy conservation; energy savings; fault tolerant manager; large scale clusters; single system image; snooze; virtualized environments; Computer architecture; Fault tolerance; Fault tolerant systems; Hardware; Monitoring; Servers; Software; Cluster; Consolidation; Dynamic Adaptation; Energy Management; Heterogeneity; SSI; Scalability; Virtualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int'l Conference on & Int'l Conference on Cyber, Physical and Social Computing (CPSCom)
  • Conference_Location
    Hangzhou
  • Print_ISBN
    978-1-4244-9779-9
  • Electronic_ISBN
    978-0-7695-4331-4
  • Type

    conf

  • DOI
    10.1109/GreenCom-CPSCom.2010.62
  • Filename
    5724821