• DocumentCode
    104430
  • Title

    Adaptive Cache Coherence Mechanisms with Producer–Consumer Sharing Optimization for Chip Multiprocessors

  • Author

    Kayi, Abdullah ; Serres, Olivier ; El-Ghazawi, Tarek

  • Author_Institution
    Intel, Hillsboro, OR, USA
  • Volume
    64
  • Issue
    2
  • fYear
    2015
  • fDate
    Feb. 2015
  • Firstpage
    316
  • Lastpage
    328
  • Abstract
    In chip multiprocessors (CMPs), maintaining cache coherence can account for a major performance overhead. Write-invalidate protocols adapted by most CMPs generate high cache-to-cache misses under producer-consumer sharing patterns. Accordingly, this paper presents three cache coherence mechanisms optimized for CMPs. First, to reduce coherence misses observed in write-invalidate-based protocols, we propose a dynamic write-update mechanism augmented on top of a write-invalidate protocol. This mechanism is specifically triggered at the detection of a producer-consumer sharing pattern. Second, we extend this adaptive protocol with a bandwidth-adaptive mechanism to eliminate performance degradation from write-updates under limited bandwidth. Finally, proximity-aware mechanism is proposed to extend the base adaptive protocol with latency-based optimizations. Experimental analysis is conducted on a set of scientific applications from the SPLASH-2 and NAS parallel benchmark suites. The proposed mechanisms were shown to reduce coherence misses by up to 48% and in return speed up application performance up to 30%. Bandwidth-adaptive mechanism is proven to perform well under varying levels of available bandwidth. Results from our proposed proximity-aware extension demonstrated up to 6% performance gain over the base adaptive protocol for 64-core tiled CMP runs. In addition, the analytical model provided good estimates for performance gains from our adaptive protocols.
  • Keywords
    cache storage; multiprocessing systems; parallel architectures; performance evaluation; 64-core tiled CMP; NAS parallel benchmark suites; SPLASH-2 suites; adaptive cache coherence mechanisms; bandwidth-adaptive mechanism; base adaptive protocol; cache-to-cache misses; chip multiprocessors; dynamic write-update mechanism; latency-based optimizations; performance degradation; producer-consumer sharing optimization; producer-consumer sharing pattern; proximity-aware extension; write-invalidate-based protocols; Bandwidth; Coherence; Multicore processing; Optimization; Protocols; Radiation detectors; Cache coherence; adaptable architectures; chip multiprocessors (CMPs); producer/consumer;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2013.217
  • Filename
    6671567