• DocumentCode
    5176
  • Title

    Designing a Physical Locality Aware Coherence Protocol for Chip-Multiprocessors

  • Author

    Fensch, Christian ; Barrow-Williams, N. ; Mullins, R.D. ; Moore, Steven

  • Author_Institution
    Sch. of Inf., Univ. of Edinburgh, Edinburgh, UK
  • Volume
    62
  • Issue
    5
  • fYear
    2013
  • fDate
    May-13
  • Firstpage
    914
  • Lastpage
    928
  • Abstract
    Many-core architectures provide an efficient way of harnessing the growing numbers of transistors available. However, energy and latency costs of communication increasingly limit the parallel programs running on these platforms. Existing designs provide a functional communication layer, but not necessarily the most efficient solution. Due to power limitations, efficiency is now a primary concern that motivates us to look again at cache coherence. First, we analyze the communication behavior of parallel applications. The observed sharing patterns reveal considerable locality of shared data accesses between threads with consecutive IDs. This pattern corresponds to strong physical locality between adjacent cores in a chip-multiprocessor (CMP). This paper explores the design of Proximity Coherence: a novel scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links. We exploit these patterns and improve the efficiency of communication. The results show that careful analysis leads to the design of a more efficient coherence protocol. The protocol reduces the latency of load misses by up to 33 percent (17 percent, on average), improving overall execution time by up to 13 percent. Furthermore, it also reduces network-on-chip traffic by 19 percent and energy consumption by up to 30 percent.
  • Keywords
    cache storage; microprocessor chips; multiprocessing systems; network-on-chip; parallel architectures; parallel programming; power aware computing; protocols; CMP; cache proximity coherence design; chip-multiprocessors; communication efficiency improvement; communication energy cost; communication latency cost; energy consumption reduction; execution time improvement; functional communication layer; many-core architectures; network-on-chip traffic reduction; optimistically forwarded L1 load misses; parallel application communication behavior; parallel programs; physical locality aware coherence protocol design; shared data access; Central Processing Unit; Coherence; Computers; Educational institutions; Energy consumption; Protocols; Transistors; CMP; Central Processing Unit; Coherence; Computers; Educational institutions; Energy consumption; Protocols; Proximity coherence; Transistors; cache design; network-on-chip; physical locality;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2012.52
  • Filename
    6158638