• DocumentCode
    1933403
  • Title

    IPMI-based Efficient Notification Framework for Large Scale Cluster Computing

  • Author

    Leangsuksun, Chokchai ; Rao, Tirumala ; Tikotekar, Anand ; Scott, Stephen L. ; Libby, Richard ; Vetter, Jeffrey S. ; Fang, Yung-Chin ; Ong, Hong

  • Author_Institution
    Dept. of Comput. Sci., Louisiana Tech. Univ., Rustan, LA
  • Volume
    2
  • fYear
    2006
  • fDate
    16-19 May 2006
  • Firstpage
    23
  • Lastpage
    23
  • Abstract
    The demand for an efficient faith tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and event management. The increasing level of details at the hardware and software layer clearly affects the scalability and performance of monitoring and management tools. In this paper, we propose a problem notification framework that directly addresses the issue of monitor scalability. We first present the design and implementation of our step-by-step approach to analyzing, filtering, and classifying the plethora of node statistics. Then, we present experimental results to show that our approach only needs minimal system resource and thus has low overhead. Finally, we introduce our Web-based cluster management system that provides hardware controls at both cluster and nodal levels
  • Keywords
    computer network management; computerised monitoring; workstation clusters; IPMI-based efficient notification framework; Web-based cluster management system; complex monitoring infrastructure; data management; event management; faith tolerance system; hardware controls; large scale cluster computing; management tools; monitor scalability; monitoring tools; node statistics; system resource; Filtering; Hardware; High performance computing; Intelligent sensors; Large-scale systems; Mathematics; Monitoring; Scalability; Software tools; Statistical analysis; High-Availability; IPM.; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing and the Grid, 2006. CCGRID 06. Sixth IEEE International Symposium on
  • Conference_Location
    Singapore
  • Print_ISBN
    0-7695-2585-7
  • Type

    conf

  • DOI
    10.1109/CCGRID.2006.1630918
  • Filename
    1630918