DocumentCode
2666328
Title
Managing Wire Delay in Large Chip-Multiprocessor Caches
Author
Beckmann, Bradford M. ; Wood, David A.
Author_Institution
University of Wisconsin-Madison
fYear
2004
fDate
04-08 Dec. 2004
Firstpage
319
Lastpage
330
Abstract
In response to increasing (relative) wire delay, architects have proposed various technologies to manage the impact of slow wires on large uniprocessor L2 caches. Block migration (e.g., D-NUCA and NuRapid) reduces average hit latency by migrating frequently used blocks towards the lower-latency banks. Transmission Line Caches (TLC) use on-chip transmission lines to provide low latency to all banks. Traditional stride-based hardware prefetching strives to tolerate, rather than reduce, latency. Chip multiprocessors (CMPs) present additional challenges. First, CMPs often share the on-chip L2 cache, requiring multiple ports to provide sufficient bandwidth. Second, multiple threads mean multiple working sets, which compete for limited on-chip storage. Third, sharing code and data interferes with block migration, since one processor´s low-latency bank is another processor´s high-latency bank. In this paper, we develop L2 cache designs for CMPs that incorporate these three latency management techniques. We use detailed full-system simulation to analyze the performance trade-offs for both commercial and scientific workloads. First, we demonstrate that block migration is less effective for CMPs because 40-60% of L2 cache hits in commercial workloads are satisfied in the central banks, which are equally far from all processors. Second, we observe that although transmission lines provide low latency, contention for their restricted bandwidth limits their performance. Third, we show stride-based prefetching between L1 and L2 caches alone improves performance by at least as much as the other two techniques. Finally, we present a hybrid design-combining all three techniques-that improves performance by an additional 2% to 19% over prefetching alone.
Keywords
Analytical models; Bandwidth; Delay; Hardware; Performance analysis; Prefetching; Technology management; Transmission lines; Wire; Yarn;
fLanguage
English
Publisher
ieee
Conference_Titel
Microarchitecture, 2004. MICRO-37 2004. 37th International Symposium on
ISSN
1072-4451
Print_ISBN
0-7695-2126-6
Type
conf
DOI
10.1109/MICRO.2004.21
Filename
1551004
Link To Document