Title :
Distributed Modular Monitoring (DiMMon) Approach to Supercomputer Monitoring
Author :
Konstantin Stefanov;Vladimir Voevodin
Author_Institution :
Res. Comput. Center, Moscow State Univ., Moscow, Russia
Abstract :
In this work we propose a design for a new distributed modular monitoring system framework, which allows combining both monitoring tasks (supercomputer health and performance monitoring) in one monitoring system. Our approach allows different part of monitoring system to process only the data needed for the task assigned to this part. Another feature of our framework is the ability to calculate performance metrics on-the-fly, dynamically creating processing modules for every job or other objects of interest.
Keywords :
"Monitoring","Supercomputers","Measurement","Conferences","Filtering","Data collection"
Conference_Titel :
Cluster Computing (CLUSTER), 2015 IEEE International Conference on
DOI :
10.1109/CLUSTER.2015.83