Title :
Scalable methods for monitoring and detecting behavioral equivalence classes in scientific codes
Author :
Gamblin, Todd ; Fowler, Rob ; Reed, Daniel A.
Author_Institution :
Renaissance Comput. Inst., Univ. of North Carolina at Chapel Hill, Chapel Hill, NC
Abstract :
Emerging petascale systems will have many hundreds of thousands of processors, but traditional task-level tracing tools already fail to scale to much smaller systems because the I/O backbones of these systems cannot handle the peak load offered by their cores. Complete event traces of all processes are thus infeasible. To retain the benefits of detailed performance measurement while reducing volume of collected data, we developed AMPL, a general-purpose toolkit that reduces data volume using stratified sampling. We adopt a scalable sampling strategy, since the sample size required to measure a system varies sub-linearly with process count. By grouping, or stratifying, processes that behave similarly, we can further reduce data overhead while also providing insight into an application´s behavior. In this paper, we describe the AMPL toolkit and we report our experiences using it on large-scale scientific applications. We show that AMPL can successfully reduce the overhead of tracing scientific applications by an order of magnitude or more, and we show that our tool scales sub-linearly, so the improvement will be more dramatic on petascale machines. Finally, we illustrate the use of AMPL to monitor applications by performance-equivalent strata, and we show that this technique can allow for further reductions in trace data volume and traced execution time.
Keywords :
equivalence classes; performance evaluation; AMPL toolkit; adaptive monitoring and profiling library; behavioral equivalence classes; petascale systems; scalable sampling strategy; scientific codes; trace data volume reduction; traced execution time; Condition monitoring; Ethernet networks; Instruments; Laboratories; Large-scale systems; Petascale computing; Sampling methods; Size measurement; Spine; Supercomputers;
Conference_Titel :
Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-1693-6
Electronic_ISBN :
1530-2075
DOI :
10.1109/IPDPS.2008.4536236