DocumentCode
3001967
Title
Characterizing Load and Communication Imbalance in Large-Scale Parallel Applications
Author
Böhme, David ; Wolf, Felix ; Geimer, Markus
Author_Institution
German Res. Sch. for Simulation Sci., RWTH Aachen Univ., Aachen, Germany
fYear
2012
fDate
21-25 May 2012
Firstpage
2538
Lastpage
2541
Abstract
Load or communication imbalance prevents many codes from taking advantage of the parallelism available on modern supercomputers. We present two scalable methods to highlight imbalance in parallel programs: The first method identifies delays that inflict wait states at subsequent synchronization points, and attributes their costs in terms of resource waste to the original cause. The second method combines knowledge of the critical path with traditional parallel profiles to derive a set of compact performance indicators that help answer a variety of important performance-analysis questions, such as identifying load imbalance, quantifying the impact of imbalance on runtime, and characterizing resource consumption. Both methods employ a highly scalable parallel replay of event traces, making them a suitable analysis instrument for massively parallel MPI programs with tens of thousands of processes.
Keywords
message passing; parallel machines; parallel programming; resource allocation; synchronisation; communication imbalance; critical path; event trace; large-scale parallel application; load imbalance; modern supercomputer; parallel MPI program; parallel profile; parallel replay; parallelism; performance indicator; resource consumption; resource waste; synchronization point; Delay; Educational institutions; Load modeling; Parallel processing; Performance analysis; Runtime; Synchronization;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
Conference_Location
Shanghai
Print_ISBN
978-1-4673-0974-5
Type
conf
DOI
10.1109/IPDPSW.2012.321
Filename
6270888
Link To Document