DocumentCode :
3322083
Title :
Verifying Causality between Distant Performance Phenomena in Large-Scale MPI Applications
Author :
Hermanns, Marc-André ; Geimer, Markus ; Wolf, Felix ; Wylie, Brian J N
Author_Institution :
Julich Supercomput. Centre, Forschungszentrum Julich, Julich
fYear :
2009
fDate :
18-20 Feb. 2009
Firstpage :
78
Lastpage :
84
Abstract :
In message-passing applications, the temporal or spatial distance between cause and symptom of a performance problem constitutes a major difficulty in deriving helpful conclusions from performance data. Just knowing the locations of wait states in the program is often insufficient to understand the reason for their occurrence. We present a method for verifying hypotheses on causality between temporally or spatially distant performance phenomena in message-passing applications without altering the application itself. The verification is accomplished by modifying MPI event traces and using them to simulate the hypothetical message-passing behavior. By performing a parallel real-time reenactment of the communication to be simulated using the original execution configuration, we can achieve high scalability and good predictive accuracy in relation to the measured behavior. Not relying on a potentially complex model of the message-passing subsystem, our method is also platform independent.
Keywords :
causality; message passing; parallel processing; MPI event traces; causality; distant performance phenomena; execution configuration; large-scale MPI applications; message-passing applications; message-passing behavior; message-passing subsystem; parallel real-time reenactment; predictive accuracy; spatial distance; temporal distance; Accuracy; Application software; Computational modeling; Computer science; Discrete event simulation; Large-scale systems; Performance evaluation; Predictive models; Scalability; Supercomputers; Causality of Performance Phenomena; Large-scale; Performance Prediction; Performance Simulation; Performance analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on
Conference_Location :
Weimar
ISSN :
1066-6192
Print_ISBN :
978-0-7695-3544-9
Type :
conf
DOI :
10.1109/PDP.2009.50
Filename :
4912918
Link To Document :
بازگشت