DocumentCode
315889
Title
Combining loop fusion with prefetching on shared-memory multiprocessors
Author
Manjikian, Naraig
Author_Institution
Dept. of Electr. & Comput. Eng., Toronto Univ., Ont., Canada
fYear
1997
fDate
11-15 Aug 1997
Firstpage
78
Lastpage
82
Abstract
The performance of programs consisting of parallel loops on shared-memory multiprocessors is limited by long memory latencies as processor speeds increase more rapidly than memory speeds. Two complementary techniques for addressing memory latency and improving performance are: (a) cache locality enhancement for latency reduction and (b) data prefetching for latency tolerance. This paper studies the benefit of combining loop fusion for locality enhancement with prefetching. Experimental results are reported for multiprocessors with support for prefetching. For a complete application on an SGI Power Challenge R10000, combining loop fusion with prefetching improves parallel speedup by 46%
Keywords
cache storage; shared memory systems; software performance evaluation; SGI Power Challenge R10000; cache locality enhancement; data prefetching; latency reduction; long memory latencies; loop fusion; memory latency; parallel loops; prefetching; shared-memory multiprocessors; Concurrent computing; Delay; Filters; Fuses; Hardware; Jacobian matrices; Lapping; Microprocessors; Parallel processing; Prefetching;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing, 1997., Proceedings of the 1997 International Conference on
Conference_Location
Bloomington, IL
ISSN
0190-3918
Print_ISBN
0-8186-8108-X
Type
conf
DOI
10.1109/ICPP.1997.622560
Filename
622560
Link To Document