DocumentCode :
827062
Title :
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
Author :
Marathe, Jaydeep ; Mueller, Frank
Author_Institution :
Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC
Volume :
18
Issue :
6
fYear :
2007
fDate :
6/1/2007 12:00:00 AM
Firstpage :
818
Lastpage :
834
Abstract :
Cache coherence in shared-memory multiprocessor systems has been studied mostly from an architecture viewpoint, often by means of aggregating metrics. In many cases, aggregate events provide insufficient information for programmers to understand and optimize the coherence behavior of their applications. A better understanding would be given by source code correlations of not only aggregate events, but also finer granularity metrics directly linked to high-level source code constructs, such as source lines and data structures. In this paper, we explore a novel application-centric approach to studying coherence traffic. We develop a coherence analysis framework based on incremental coherence simulation of actual reference traces. We provide tool support to extract these reference traces and synchronization information from OpenMP threads at runtime using dynamic binary rewriting of the application executable. These traces are fed to ccSIM, our cache-coherence simulator. The novelty of ccSIM lies in its ability to relate low-level cache coherence metrics (such as coherence misses and their causative invalidations) to high-level source code constructs including source code locations and data structures. We explore the degree of freedom in interleaving data traces from different processors and assess simulation accuracy in comparison to metrics obtained from hardware performance counters. Our quantitative results show that: 1) Cache coherence traffic can be simulated with a considerable degree of accuracy for SPMD programs, as the invalidation traffic closely matches the corresponding hardware performance counters. 2) Detailed, high-level coherence statistics are very useful in detecting, isolating, and understanding coherence bottlenecks. We use ccSIM with several well-known benchmarks and find coherence optimization opportunities leading to significant reductions in coherence traffic and savings in wall-clock execution time
Keywords :
cache storage; distributed shared memory systems; OpenMP threads; data structures; dynamic binary rewriting; shared-memory multiprocessor system; single program multiple-data program; source-code-correlated cache coherence characterization; Aggregates; Analytical models; Counting circuits; Data mining; Data structures; Hardware; Multiprocessing systems; Programming profession; Traffic control; Yarn; Cache memories; SMPs; coherence protocols.; dynamic binary rewriting; program instrumentation; simulation;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2007.1058
Filename :
4180348
Link To Document :
بازگشت