• DocumentCode
    3224019
  • Title

    Performance Debugging of GPGPU Applications with the Divergence Map

  • Author

    Coutinho, Bruno ; Sampaio, Diogo ; Pereira, Fernando M Q ; Meira, Wagner, Jr.

  • Author_Institution
    Dept. de Cienc. da Comput., Univ. Fed. de Minas Gerais, Belo Horizonte, Brazil
  • fYear
    2010
  • fDate
    27-30 Oct. 2010
  • Firstpage
    33
  • Lastpage
    40
  • Abstract
    The increasing programability and the high computational power of Graphical Processing Units (GPU) make them attractive to general purpose programming. However, taking full benefit of this execution environment is a challenging task. One of these challenges stem from divergences, a phenomenon that occurs when threads that execute in lock-step are forced to take different program paths due to branches in the code. In face of divergences, some threads will have to wait, idly, while their diverging siblings execute. Optimizing the code to avoid divergences is difficult, because this task demands a deep understanding of programs that might be large and convoluted. In order to facilitate the detection of divergences, this paper introduces the divergence map, a data structure that indicates the location and the volume of divergences in a program. We build this map via dynamic profiling techniques, which we have implemented on top of an open source CUDA compiler. To illustrate the importance of the divergence map, we have used it to pin-point the core regions that must be optimized in well known public applications. By hand optimizing some applications, we have added 9-11% speedups onto kernels that have already gone through the sieve of many programmers.
  • Keywords
    computer graphic equipment; coprocessors; data structures; program compilers; program debugging; public domain software; GPGPU applications; data structure; divergence map; dynamic profiling techniques; general purpose programming; graphical processing units; open source CUDA compiler; performance debugging; Arrays; Graphics processing unit; Hardware; Instruction sets; Instruments; Kernel; Optimization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture and High Performance Computing (SBAC-PAD), 2010 22nd International Symposium on
  • Conference_Location
    Petropolis
  • ISSN
    1550-6533
  • Print_ISBN
    978-1-4244-8287-0
  • Electronic_ISBN
    1550-6533
  • Type

    conf

  • DOI
    10.1109/SBAC-PAD.2010.38
  • Filename
    5644926