• DocumentCode
    315889
  • Title

    Combining loop fusion with prefetching on shared-memory multiprocessors

  • Author

    Manjikian, Naraig

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Toronto Univ., Ont., Canada
  • fYear
    1997
  • fDate
    11-15 Aug 1997
  • Firstpage
    78
  • Lastpage
    82
  • Abstract
    The performance of programs consisting of parallel loops on shared-memory multiprocessors is limited by long memory latencies as processor speeds increase more rapidly than memory speeds. Two complementary techniques for addressing memory latency and improving performance are: (a) cache locality enhancement for latency reduction and (b) data prefetching for latency tolerance. This paper studies the benefit of combining loop fusion for locality enhancement with prefetching. Experimental results are reported for multiprocessors with support for prefetching. For a complete application on an SGI Power Challenge R10000, combining loop fusion with prefetching improves parallel speedup by 46%
  • Keywords
    cache storage; shared memory systems; software performance evaluation; SGI Power Challenge R10000; cache locality enhancement; data prefetching; latency reduction; long memory latencies; loop fusion; memory latency; parallel loops; prefetching; shared-memory multiprocessors; Concurrent computing; Delay; Filters; Fuses; Hardware; Jacobian matrices; Lapping; Microprocessors; Parallel processing; Prefetching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing, 1997., Proceedings of the 1997 International Conference on
  • Conference_Location
    Bloomington, IL
  • ISSN
    0190-3918
  • Print_ISBN
    0-8186-8108-X
  • Type

    conf

  • DOI
    10.1109/ICPP.1997.622560
  • Filename
    622560