• DocumentCode
    2960068
  • Title

    GTI: A Generic Tools Infrastructure for Event-Based Tools in Parallel Systems

  • Author

    Hilbrich, Tobias ; Müller, Matthias S. ; De Supinski, Bronis R. ; Schulz, Martin ; Nagel, Wolfgang E.

  • Author_Institution
    ZIH, Tech. Univ. Dresden, Dresden, Germany
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    1364
  • Lastpage
    1375
  • Abstract
    Runtime detection of semantic errors in MPI applications supports efficient and correct large-scale application development. However, current approaches scale to at most one thousand processes and design limitations prevent increased scalability. The need for global knowledge for analyses such as type matching, and deadlock detection presents a major challenge. We present a scalable tool infrastructure - the Generic Tool Infrastructure (GTI) - that we will use to implement MPI runtime error detection tools and that applies to other use cases. GTI supports simple offloading of tool processing onto extra processes or threads and provides a tree based overlay network (TBON) for creating scalable tools that analyze global knowledge. We present its abstractions and code generation facilities that ease many hurdles in tool development, including wrapper generation, tool communication, trace reductions, and filters. GTI ultimately allows tool developers to focus on implementing tool functionality instead of the surrounding infrastructure. Further, we demonstrate that GTI supports scalable tool development through a lost message detector and a phase profiler. The former provides a more scalable implementation of important base functionality for MPI correctness checking, while the latter tool demonstrates that GTI can serve as the basis of further types of tools. Experiments with up to 2048 cores show that GTI´s scalability features apply to both tools.
  • Keywords
    application program interfaces; parallel programming; program verification; system recovery; GTI; MPI applications; MPI correctness checking; MPI runtime error detection tools; TBON; deadlock detection; design limitations; event-based tools; generic tools infrastructure; lost message detector; parallel systems; phase profiler; runtime semantic errors detection; scalable tool infrastructure; scalable tools; tool communication; tool functionality; trace reductions; tree based overlay network; type matching; wrapper generation; Complexity theory; Detectors; Layout; Libraries; Runtime; Scalability; XML; Message Passing Interface; Runtime error detection; Scalability; Tool infrastructure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
  • Conference_Location
    Shanghai
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4673-0975-2
  • Type

    conf

  • DOI
    10.1109/IPDPS.2012.123
  • Filename
    6267937