CASITA: A Tool for Identifying Critical Optimization Targets in Distributed Heterogeneous Applications

Author

Schmitt, Felix ; Stolle, Jonas ; Dietrich, Robert

Author_Institution

Center for Inf. Services & High Performance Comput. (ZIH), Tech. Univ. Dresden, Dresden, Germany

fYear

2014

Firstpage

186

Lastpage

195

Abstract

Programming of high performance computing systems has become more complex over time. Several layers of parallelism need to be exploited to efficiently utilize the available resources. To support application developers and performance analysts we propose a technique for identifying the most performance critical optimization targets in distributed heterogeneous applications. We have developed CASITA, a tool which uses an execution trace and the knowledge about the programming models MPI, OpenMP and CUDA as well as their hierarchy among each other to build a distributed event dependency graph. After locating wait states in this graph, we detect their root cause and compute the critical path, an important property for performance optimizations. Compared to existing analysis approaches, we incorporate the hierarchy of multiple programming models and derive a metric from both the time an activity spends on the critical path and the waiting time it caused. For the purpose of visualization, CASITA enriches the input trace with additional counter information so that results can be inspected in the Vampir trace viewer.

Keywords

application program interfaces; graph theory; message passing; parallel architectures; parallel programming; CASITA tool; CUDA programming model; MPI programming model; OpenMP programming model; Vampir trace viewer; activity metric; counter information; critical optimization target identification; critical path; distributed event dependency graph; distributed heterogeneous applications; execution trace; high-performance computing system programming; multiple programming model hierarchy; parallelism layers; performance optimizations; root cause detection; time metric; wait states; Analytical models; Computational modeling; Graphics processing units; Kernel; Optimization; Programming; Synchronization; CUDA; MPI; OpenMP; critical path analysis; performance analysis; performance optimization; wait states;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel Processing Workshops (ICCPW), 2014 43rd International Conference on

ISSN

1530-2016

Type

conf

DOI

10.1109/ICPPW.2014.35

Filename

7103453