• DocumentCode
    1682968
  • Title

    Scalable methods for monitoring and detecting behavioral equivalence classes in scientific codes

  • Author

    Gamblin, Todd ; Fowler, Rob ; Reed, Daniel A.

  • Author_Institution
    Renaissance Comput. Inst., Univ. of North Carolina at Chapel Hill, Chapel Hill, NC
  • fYear
    2008
  • Firstpage
    1
  • Lastpage
    12
  • Abstract
    Emerging petascale systems will have many hundreds of thousands of processors, but traditional task-level tracing tools already fail to scale to much smaller systems because the I/O backbones of these systems cannot handle the peak load offered by their cores. Complete event traces of all processes are thus infeasible. To retain the benefits of detailed performance measurement while reducing volume of collected data, we developed AMPL, a general-purpose toolkit that reduces data volume using stratified sampling. We adopt a scalable sampling strategy, since the sample size required to measure a system varies sub-linearly with process count. By grouping, or stratifying, processes that behave similarly, we can further reduce data overhead while also providing insight into an application´s behavior. In this paper, we describe the AMPL toolkit and we report our experiences using it on large-scale scientific applications. We show that AMPL can successfully reduce the overhead of tracing scientific applications by an order of magnitude or more, and we show that our tool scales sub-linearly, so the improvement will be more dramatic on petascale machines. Finally, we illustrate the use of AMPL to monitor applications by performance-equivalent strata, and we show that this technique can allow for further reductions in trace data volume and traced execution time.
  • Keywords
    equivalence classes; performance evaluation; AMPL toolkit; adaptive monitoring and profiling library; behavioral equivalence classes; petascale systems; scalable sampling strategy; scientific codes; trace data volume reduction; traced execution time; Condition monitoring; Ethernet networks; Instruments; Laboratories; Large-scale systems; Petascale computing; Sampling methods; Size measurement; Spine; Supercomputers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
  • Conference_Location
    Miami, FL
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4244-1693-6
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2008.4536236
  • Filename
    4536236