Title :
Adaptive Cache Coherence Mechanisms with Producer–Consumer Sharing Optimization for Chip Multiprocessors
Author :
Kayi, Abdullah ; Serres, Olivier ; El-Ghazawi, Tarek
Author_Institution :
Intel, Hillsboro, OR, USA
Abstract :
In chip multiprocessors (CMPs), maintaining cache coherence can account for a major performance overhead. Write-invalidate protocols adapted by most CMPs generate high cache-to-cache misses under producer-consumer sharing patterns. Accordingly, this paper presents three cache coherence mechanisms optimized for CMPs. First, to reduce coherence misses observed in write-invalidate-based protocols, we propose a dynamic write-update mechanism augmented on top of a write-invalidate protocol. This mechanism is specifically triggered at the detection of a producer-consumer sharing pattern. Second, we extend this adaptive protocol with a bandwidth-adaptive mechanism to eliminate performance degradation from write-updates under limited bandwidth. Finally, proximity-aware mechanism is proposed to extend the base adaptive protocol with latency-based optimizations. Experimental analysis is conducted on a set of scientific applications from the SPLASH-2 and NAS parallel benchmark suites. The proposed mechanisms were shown to reduce coherence misses by up to 48% and in return speed up application performance up to 30%. Bandwidth-adaptive mechanism is proven to perform well under varying levels of available bandwidth. Results from our proposed proximity-aware extension demonstrated up to 6% performance gain over the base adaptive protocol for 64-core tiled CMP runs. In addition, the analytical model provided good estimates for performance gains from our adaptive protocols.
Keywords :
cache storage; multiprocessing systems; parallel architectures; performance evaluation; 64-core tiled CMP; NAS parallel benchmark suites; SPLASH-2 suites; adaptive cache coherence mechanisms; bandwidth-adaptive mechanism; base adaptive protocol; cache-to-cache misses; chip multiprocessors; dynamic write-update mechanism; latency-based optimizations; performance degradation; producer-consumer sharing optimization; producer-consumer sharing pattern; proximity-aware extension; write-invalidate-based protocols; Bandwidth; Coherence; Multicore processing; Optimization; Protocols; Radiation detectors; Cache coherence; adaptable architectures; chip multiprocessors (CMPs); producer/consumer;
Journal_Title :
Computers, IEEE Transactions on
DOI :
10.1109/TC.2013.217