• DocumentCode
    827062
  • Title

    Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks

  • Author

    Marathe, Jaydeep ; Mueller, Frank

  • Author_Institution
    Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC
  • Volume
    18
  • Issue
    6
  • fYear
    2007
  • fDate
    6/1/2007 12:00:00 AM
  • Firstpage
    818
  • Lastpage
    834
  • Abstract
    Cache coherence in shared-memory multiprocessor systems has been studied mostly from an architecture viewpoint, often by means of aggregating metrics. In many cases, aggregate events provide insufficient information for programmers to understand and optimize the coherence behavior of their applications. A better understanding would be given by source code correlations of not only aggregate events, but also finer granularity metrics directly linked to high-level source code constructs, such as source lines and data structures. In this paper, we explore a novel application-centric approach to studying coherence traffic. We develop a coherence analysis framework based on incremental coherence simulation of actual reference traces. We provide tool support to extract these reference traces and synchronization information from OpenMP threads at runtime using dynamic binary rewriting of the application executable. These traces are fed to ccSIM, our cache-coherence simulator. The novelty of ccSIM lies in its ability to relate low-level cache coherence metrics (such as coherence misses and their causative invalidations) to high-level source code constructs including source code locations and data structures. We explore the degree of freedom in interleaving data traces from different processors and assess simulation accuracy in comparison to metrics obtained from hardware performance counters. Our quantitative results show that: 1) Cache coherence traffic can be simulated with a considerable degree of accuracy for SPMD programs, as the invalidation traffic closely matches the corresponding hardware performance counters. 2) Detailed, high-level coherence statistics are very useful in detecting, isolating, and understanding coherence bottlenecks. We use ccSIM with several well-known benchmarks and find coherence optimization opportunities leading to significant reductions in coherence traffic and savings in wall-clock execution time
  • Keywords
    cache storage; distributed shared memory systems; OpenMP threads; data structures; dynamic binary rewriting; shared-memory multiprocessor system; single program multiple-data program; source-code-correlated cache coherence characterization; Aggregates; Analytical models; Counting circuits; Data mining; Data structures; Hardware; Multiprocessing systems; Programming profession; Traffic control; Yarn; Cache memories; SMPs; coherence protocols.; dynamic binary rewriting; program instrumentation; simulation;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2007.1058
  • Filename
    4180348