• DocumentCode
    703924
  • Title

    Eliminating intra-warp conflict misses in GPU

  • Author

    Bin Wang ; Zhuo Liu ; Xinning Wang ; Weikuan Yu

  • Author_Institution
    Dept. of Comput. Sci., Auburn Univ., Auburn, AL, USA
  • fYear
    2015
  • fDate
    9-13 March 2015
  • Firstpage
    689
  • Lastpage
    694
  • Abstract
    Cache indexing functions play a key role in reducing conflict misses by spreading accesses evenly among all sets of cache blocks. Although various methods have been proposed, no significant effort has been expended on the behavior of conflict misses in GPU where threads are organized into warps and execute in lock-step. When intra-warp accesses could not be coalesced into one or two cache blocks, which is often referred to as memory divergence, a warp incurs up to SIMD-width (e.g., 32) independent cache accesses. Such a burst of divergent accesses not only increases contention on cache capacity, but also incurs intra-warp associativity conflicts when they are pathologically concentrated in a few cache sets. Due to the lockstep execution, the GPU Load/Store units would be stalled when intra-warp concentration exceeds available cache associativity. Through an in-depth analysis of GPU access patterns, we find that column-majored strided accesses are likely to incur high intra-warp concentration. Based on the analysis, we propose a Full Permutation (FUP) based indexing method that adapts to both large and medium strides in this pattern. Across the 10 highly cache-sensitive GPU applications we have evaluated, FUP eliminates intra-warp associativity conflicts and outperforms two state-of-the-art indexing methods by 22% and 15%, respectively.
  • Keywords
    cache storage; graphics processing units; FUP based indexing method; GPU access patterns; column-majored strided accesses; full permutation; graphics processing units; intra-warp associativity conflicts; intra-warp concentration; intra-warp conflict misses elimination; Benchmark testing; Graphics processing units; Indexing; Instruction sets; Measurement; Parallel processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015
  • Conference_Location
    Grenoble
  • Print_ISBN
    978-3-9815-3704-8
  • Type

    conf

  • Filename
    7092476