• DocumentCode
    2787236
  • Title

    Stack Trace Analysis for Large Scale Debugging

  • Author

    Arnold, Dorian C. ; Ahn, Dong H. ; De Supinski, Bronis R. ; Lee, Gregory L. ; Miller, Barton P. ; Schulz, Martin

  • Author_Institution
    Dept. of Comput. Sci., Wisconsin Univ., Madison, WI
  • fYear
    2007
  • fDate
    26-30 March 2007
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    We present the Stack Trace Analysis Tool (STAT) to aid in debugging extreme-scale applications. STAT can reduce problem exploration spaces from thousands of processes to a few by sampling stack traces to form process equivalence classes, groups of processes exhibiting similar behavior. We can then use full-featured debuggers on representatives from these behavior classes for root cause analysis. STAT scalably collects stack traces over a sampling period to assemble a profile of the application´s behavior. STAT routines process the samples to form a call graph prefix tree that encodes common behavior classes over the program´s process space and time. STAT leverages MRNet, an infrastructure for tool control and data analyses, to overcome scalability barriers faced by heavy-weight debuggers. We present STAT´s design and an evaluation that shows STAT gathers informative process traces from thousands of processes with sub-second latencies, a significant improvement over existing tools. Our case studies of production codes verify that STAT supports the quick identification of errors that were previously difficult to locate.
  • Keywords
    parallel programming; program debugging; program diagnostics; software libraries; software tools; trees (mathematics); STAT routines; Stack Trace Analysis Tool; call graph prefix tree; large scale debugging; parallel application; root cause analysis; Assembly; Data analysis; Debugging; Delay; Large-scale systems; Production; Sampling methods; Scalability; Space exploration; Tree graphs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International
  • Conference_Location
    Long Beach, CA
  • Print_ISBN
    1-4244-0910-1
  • Electronic_ISBN
    1-4244-0910-1
  • Type

    conf

  • DOI
    10.1109/IPDPS.2007.370254
  • Filename
    4227982