• DocumentCode
    2571127
  • Title

    Balancing Locality and Parallelism on Shared-cache Mulit-core Systems

  • Author

    Cade, Michael Jason ; Qasem, Apan

  • Author_Institution
    Texas State Univ., San Marcos, TX, USA
  • fYear
    2009
  • fDate
    25-27 June 2009
  • Firstpage
    188
  • Lastpage
    195
  • Abstract
    The emergence of multi-core systems opens new opportunities for thread-level parallelism and dramatically increases the performance potential of applications running on these systems. However, the state of the art in performance enhancing software is far from adequate in regards to the exploitation of hardware features on this complex new architecture. As a result, much of the performance capabilities of multi-core systems are yet to be realized. This research addresses one facet of this problem by exploring the relationship between data-locality and parallelism in the context of multi-core architectures where one or more levels of cache are shared among the different cores. A model is presented for determining a profitable synchronization interval for concurrent threads that interact in a producer-consumer relationship. Experimental results suggest that consideration of the synchronization window, or the amount of work individual threads can be allowed to do between synchronizations, allows for parallelism- and locality-aware performance optimizations. The optimum synchronization window is a function of the number of threads, data reuse patterns within the workload, and the size and configuration of the last-level of cache that is shared among processing units. By considering these factors, the calculation of the optimum synchronization window incorporates parallelism and data locality issues for maximum performance.
  • Keywords
    cache storage; concurrency control; multi-threading; parallel processing; shared memory systems; concurrent threads; data locality; data reuse pattern; locality-aware performance optimization; multicore architecture; multicore system; optimum synchronization window; parallelism-aware performance optimization; producer-consumer relationship; shared cache; thread-level parallelism; Computer architecture; Costs; Energy consumption; Frequency synchronization; Hardware; Parallel processing; Power system modeling; Software performance; Throughput; Yarn; memory hierarchy optimization; parallelism; performance tuning; shared-cache;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications, 2009. HPCC '09. 11th IEEE International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4244-4600-1
  • Electronic_ISBN
    978-0-7695-3738-2
  • Type

    conf

  • DOI
    10.1109/HPCC.2009.61
  • Filename
    5166993