DocumentCode :
2571127
Title :
Balancing Locality and Parallelism on Shared-cache Mulit-core Systems
Author :
Cade, Michael Jason ; Qasem, Apan
Author_Institution :
Texas State Univ., San Marcos, TX, USA
fYear :
2009
fDate :
25-27 June 2009
Firstpage :
188
Lastpage :
195
Abstract :
The emergence of multi-core systems opens new opportunities for thread-level parallelism and dramatically increases the performance potential of applications running on these systems. However, the state of the art in performance enhancing software is far from adequate in regards to the exploitation of hardware features on this complex new architecture. As a result, much of the performance capabilities of multi-core systems are yet to be realized. This research addresses one facet of this problem by exploring the relationship between data-locality and parallelism in the context of multi-core architectures where one or more levels of cache are shared among the different cores. A model is presented for determining a profitable synchronization interval for concurrent threads that interact in a producer-consumer relationship. Experimental results suggest that consideration of the synchronization window, or the amount of work individual threads can be allowed to do between synchronizations, allows for parallelism- and locality-aware performance optimizations. The optimum synchronization window is a function of the number of threads, data reuse patterns within the workload, and the size and configuration of the last-level of cache that is shared among processing units. By considering these factors, the calculation of the optimum synchronization window incorporates parallelism and data locality issues for maximum performance.
Keywords :
cache storage; concurrency control; multi-threading; parallel processing; shared memory systems; concurrent threads; data locality; data reuse pattern; locality-aware performance optimization; multicore architecture; multicore system; optimum synchronization window; parallelism-aware performance optimization; producer-consumer relationship; shared cache; thread-level parallelism; Computer architecture; Costs; Energy consumption; Frequency synchronization; Hardware; Parallel processing; Power system modeling; Software performance; Throughput; Yarn; memory hierarchy optimization; parallelism; performance tuning; shared-cache;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications, 2009. HPCC '09. 11th IEEE International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4244-4600-1
Electronic_ISBN :
978-0-7695-3738-2
Type :
conf
DOI :
10.1109/HPCC.2009.61
Filename :
5166993
Link To Document :
بازگشت