• DocumentCode
    639328
  • Title

    Traffic steering between a low-latency unswitched TL ring and a high-throughput switched on-chip interconnect

  • Author

    Wei Ding ; Jun Liu ; Kandemir, Mahmut ; Irwin, Mary Jane

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA, USA
  • fYear
    2013
  • fDate
    7-11 Sept. 2013
  • Firstpage
    309
  • Lastpage
    318
  • Abstract
    Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that determines memory system performance. Many emerging multicores employ banked memory systems and each bank is attached a row-buffer that holds the most-recently accessed memory row (page). A last-level cache miss that also misses in the row-buffer can experience much higher latency than a cache miss that hits in the row-buffer. Consequently, optimizing for row-buffer locality can be as important as optimizing for cache locality. Targeting emerging multicores and multithreaded applications, this paper presents a compiler-directed row-buffer locality optimization strategy. This strategy modifies the memory layout of data to increase the number of row-buffer hits without increasing the number of misses in the on-chip cache hierarchy. We implemented our proposed optimization strategy in an open-source compiler and tested its effectiveness in improving the row-buffer performance using a set of multithreaded applications. Our results indicate that the proposed approach improves the average data access latency by about 29%, and this translates, on average, to about 15% improvement in execution time.
  • Keywords
    cache storage; multi-threading; multiprocessing systems; optimising compilers; cache locality optimization scheme; cache misses; compiler-directed row-buffer locality; data access latency; memory system; multicore system; multithreaded application; on-chip cache hierarchy; open-source compiler; Arrays; Indexes; Instruction sets; Layout; Multicore processing; Optimization; Vectors; network-on-chip; traffic steering; transmission line;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures and Compilation Techniques (PACT), 2013 22nd International Conference on
  • Conference_Location
    Edinburgh
  • ISSN
    1089-795X
  • Print_ISBN
    978-1-4799-1018-2
  • Type

    conf

  • DOI
    10.1109/PACT.2013.6618820
  • Filename
    6618820