DocumentCode :
3200615
Title :
Reaping the Benefit of Temporal Silence to Improve Communication Performance
Author :
Lepak, Kevin M. ; Lipasti, Mikko H.
Author_Institution :
Dept. of Electr. & Comput. Eng., Wisconsin Univ., Madison, WI
fYear :
2005
fDate :
20-22 March 2005
Firstpage :
258
Lastpage :
268
Abstract :
Communication misses - those serviced by dirty data in remote caches - are a pressing performance limiter in shared-memory multiprocessors. Recent research has indicated that temporally silent stores can be exploited to substantially reduce such misses, either with coherence protocol enhancements (MESTI); by employing speculation to create atomic silent store-pairs that achieve speculative lock elision (SLE); or by employing load value prediction (LVP). We evaluate all three approaches utilizing full-system, execution-driven simulation, with scientific and commercial workloads, to measure performance. Our studies indicate that accurate detection of elision idioms for SLE is vitally important for delivering robust performance and appears difficult for existing commercial codes. Furthermore, common datapath issues in out-of-order cores cause barriers to speculation and therefore may cause SLE failures unless SLE-specific speculation mechanisms are added to the microarchitecture. We also propose novel prediction and silence detection mechanisms that enable the MESTI protocol to deliver robust performance for all workloads. Finally, we conduct a detailed execution-driven performance evaluation of load value prediction (LVP), another simple method for capturing the benefit of temporally silent stores. We show that while theoretically LVP can capture the greatest fraction of communication misses among all approaches, it is usually not the most effective at delivering performance. This occurs because attempting to hide latency by speculating at the consumer, i.e. predicting load values, is fundamentally less effective than eliminating the latency at the source, by removing the invalidation effect of stores. Applying each method, we observe performance changes in application benchmarks ranging from 1% to 14% for an enhanced version of MESTI, -1.0% to 9% for LVP, -3% to 9% for enhanced SLE, and 2% to 21% for combined techniques
Keywords :
benchmark testing; cache storage; parallel programming; performance evaluation; protocols; shared memory systems; MESTI protocol; coherence protocol enhancement; common datapath issues; communication misses; execution-driven simulation; load value prediction; microarchitecture; remote caches; shared-memory multiprocessors; speculative lock elision; temporally silent stores; Atomic measurements; Coherence; Delay; Frequency; Microarchitecture; Out of order; Pressing; Protocols; Read-write memory; Robustness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Performance Analysis of Systems and Software, 2005. ISPASS 2005. IEEE International Symposium on
Conference_Location :
Austin, TX
Print_ISBN :
0-7803-8965-4
Type :
conf
DOI :
10.1109/ISPASS.2005.1430580
Filename :
1430580
Link To Document :
بازگشت