Title :
Online Critical Path Profiling for Parallel Applications
Author :
Zhu, Wenbin ; Bridges, Patrick G. ; Maccabe, Arthur B.
Author_Institution :
Dept. of Comput. Sci., New Mexico Univ., Albuquerque, NM
Abstract :
Online monitoring of parallel applications is increasingly important for techniques such as load balancing, protocol adaptation, and online anomaly detection. Unfortunately, existing online monitoring techniques only monitor individual hosts in a distributed-memory parallel application. In this paper, we show how a new monitoring technique, message-centric monitoring, can be used for online monitoring of the complete critical path in distributed-memory parallel applications. Results from an MPI-based message-centric monitoring prototype called IMPuLSE show that it has less than 3% runtime overhead, accurately measures whole-system performance as the application runs, and captures data that can be used by nodes to detect unusual system behaviors at runtime
Keywords :
message passing; parallel processing; system monitoring; IMPuLSE monitoring; MPI; distributed-memory parallel application; load balancing; message-centric monitoring; online anomaly detection; online critical path profiling; online monitoring; parallel applications; protocol adaptation; Application software; Bridges; Computer science; Computerized monitoring; Load management; Protocols; Prototypes; Runtime; Statistical distributions; Subcontracting;
Conference_Titel :
Cluster Computing, 2005. IEEE International
Conference_Location :
Burlington, MA
Print_ISBN :
0-7803-9486-0
Electronic_ISBN :
1552-5244
DOI :
10.1109/CLUSTR.2005.347048