• DocumentCode
    3557920
  • Title

    Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems

  • Author

    Lee, Jaejin ; Jung, Changhee ; Lim, Daeseob ; Solihin, Yan

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Seoul Nat. Univ., Seoul, South Korea
  • Volume
    20
  • Issue
    9
  • fYear
    2009
  • Firstpage
    1309
  • Lastpage
    1324
  • Abstract
    This paper presents a helper thread prefetching scheme that is designed to work on loosely coupled processors, such as in a standard chip multiprocessor (CMP) system or an intelligent memory system. Loosely coupled processors have an advantage in that resources such as processor and L1 cache resources are not contended by the application and helper threads, hence preserving the speed of the application. However, interprocessor communication is expensive in such a system. We present techniques to alleviate this. Our approach exploits large loop-based code regions and is based on a new synchronization mechanism between the application and helper threads. This mechanism precisely controls how far ahead the execution of the helper thread can be with respect to the application thread. We found that this is important in ensuring prefetching timeliness and avoiding cache pollution. To demonstrate that prefetching in a loosely coupled system can be done effectively, we evaluate our prefetching by simulating a standard unmodified CMP system and an intelligent memory system where a simple processor in memory executes the helper thread. Evaluating our scheme with nine memory-intensive applications with the memory processor in DRAM achieves an average speedup of 1.25. Moreover, our scheme works well in combination with a conventional processor-side sequential L1 prefetcher, resulting in an average speedup of 1.31. In a standard CMP, the scheme achieves an average speedup of 1.33. Using a real CMP system with a shared L2 cache between two cores, our helper thread prefetching plus hardware L2 prefetching achieves an average speedup of 1.15 over the hardware L2 prefetching for the subset of applications with high L2 cache misses per cycle.
  • Keywords
    DRAM chips; cache storage; microprocessor chips; multi-threading; DRAM; helper threads; intelligent memory system; interprocessor communication; loosely coupled multiprocessor systems; memory processor; memory-intensive applications; prefetching; standard chip multiprocessor system; Cache memories; Concurrent Programming; Helper thread; Multi-core/single-chip multiprocessors; chip multiprocessors; prefetching; processing-in-memory system.;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • Conference_Location
    10/10/2008 12:00:00 AM
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2008.224
  • Filename
    4641920