DocumentCode
2041427
Title
Improving cache locality for thread-level speculation
Author
Fung, Stanley L C ; Steffan, J. Gregory
Author_Institution
Dept. of Electr. & Comput. Eng., Toronto Univ., Ont.
fYear
2006
fDate
25-29 April 2006
Abstract
With the advent of chip-multiprocessors (CMPs), thread-level speculation (TLS) remains a promising technique for exploiting this highly multithreaded hardware to improve the performance of an individual program. However, with such speculatively-parallel execution the cache locality once enjoyed by the original uniprocessor execution is significantly disrupted: for TLS execution on a four-processor CMP, we find that the data-cache miss rates are nearly four-times those of the uniprocessor case, even though TLS execution utilizes four private data caches (i.e., four-fold greater cache capacity). We break down the TLS cache locality problem into instruction and data cache, execution stages, and parallel access patterns, and propose methods to improve cache locality in each of these areas. We find that for parallel regions across 13 SPECint applications our simple and low-cost techniques reduce data-cache misses by 38%, improve performance by 12.8%, and significantly improve scalability - further enhancing the feasibility of TLS as a way to capitalize on future CMPs
Keywords
cache storage; microprocessor chips; multi-threading; multiprocessing systems; SPECint; cache locality; chip multiprocessors; parallel access patterns; thread-level speculation; Hardware; Program processors; Sun; Throughput; Yarn;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
Conference_Location
Rhodes Island
Print_ISBN
1-4244-0054-6
Type
conf
DOI
10.1109/IPDPS.2006.1639271
Filename
1639271
Link To Document