• DocumentCode
    3588930
  • Title

    CASITA: A Tool for Identifying Critical Optimization Targets in Distributed Heterogeneous Applications

  • Author

    Schmitt, Felix ; Stolle, Jonas ; Dietrich, Robert

  • Author_Institution
    Center for Inf. Services & High Performance Comput. (ZIH), Tech. Univ. Dresden, Dresden, Germany
  • fYear
    2014
  • Firstpage
    186
  • Lastpage
    195
  • Abstract
    Programming of high performance computing systems has become more complex over time. Several layers of parallelism need to be exploited to efficiently utilize the available resources. To support application developers and performance analysts we propose a technique for identifying the most performance critical optimization targets in distributed heterogeneous applications. We have developed CASITA, a tool which uses an execution trace and the knowledge about the programming models MPI, OpenMP and CUDA as well as their hierarchy among each other to build a distributed event dependency graph. After locating wait states in this graph, we detect their root cause and compute the critical path, an important property for performance optimizations. Compared to existing analysis approaches, we incorporate the hierarchy of multiple programming models and derive a metric from both the time an activity spends on the critical path and the waiting time it caused. For the purpose of visualization, CASITA enriches the input trace with additional counter information so that results can be inspected in the Vampir trace viewer.
  • Keywords
    application program interfaces; graph theory; message passing; parallel architectures; parallel programming; CASITA tool; CUDA programming model; MPI programming model; OpenMP programming model; Vampir trace viewer; activity metric; counter information; critical optimization target identification; critical path; distributed event dependency graph; distributed heterogeneous applications; execution trace; high-performance computing system programming; multiple programming model hierarchy; parallelism layers; performance optimizations; root cause detection; time metric; wait states; Analytical models; Computational modeling; Graphics processing units; Kernel; Optimization; Programming; Synchronization; CUDA; MPI; OpenMP; critical path analysis; performance analysis; performance optimization; wait states;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on
  • ISSN
    1530-2016
  • Type

    conf

  • DOI
    10.1109/ICPPW.2014.35
  • Filename
    7103453