DocumentCode :
2424204
Title :
Checkpointed early load retirement
Author :
Kirman, Nevin ; Kirman, Meyrem ; Chaudhuri, Mainak ; Martinez, José F.
Author_Institution :
Comput. Syst. Lab., Cornell Univ., Ithaca, NY, USA
fYear :
2005
fDate :
12-16 Feb. 2005
Firstpage :
16
Lastpage :
27
Abstract :
Long-latency loads are critical in today\´s processors due to the ever-increasing speed gap with memory. Not only do these loads block the execution of dependent instructions, they also prevent other instructions from moving through the in-order reorder buffer (ROB) and retire. As a result, the processor quickly fills up with uncommitted instructions, and computation ultimately stalls. To attack this problem, we propose checkpointed early load retirement, a mechanism that combines register checkpointing and back-end .e., at retirement - load-value prediction. When a long-latency load hits the ROB head unresolved, the processor enters clear mode by (1) taking a checkpoint of the architectural registers, (2) supplying a load-value prediction to consumers, and (3) early-retiring the long-latency load. This unclogs the ROB, thereby "clearing the way" for subsequent instructions to retire, and also allowing instructions dependent on the long-latency load to execute sooner. When the actual value returns from memory, it is compared against the prediction. A misprediction causes the processor to roll back to the checkpoint, discarding all subsequent computation. The benefits of executing in clear mode come from providing early forward progress on correct predictions, and from warming up caches and other structures on wrong predictions. Our evaluation shows that a clear implementation with support for four checkpoints yields an average speedup of 1.12 for both eleven integer and eight floating-point applications (1.27 and 1.19 for five integer and five floating point memory-bound applications, respectively), relative to a contemporary out-of-order processor with an aggressive hardware prefetcher.
Keywords :
cache storage; checkpointing; floating point arithmetic; memory architecture; resource allocation; aggressive hardware prefetcher; architectural registers; checkpointed early load retirement; dependent instructions; floating-point applications; load-value prediction; long-latency loads; out-of-order processor; register checkpointing; reorder buffer; uncommitted instructions; Application software; Checkpointing; Computer aided instruction; Hardware; Laboratories; Out of order; Pipelines; Prefetching; Registers; Retirement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on
ISSN :
1530-0897
Print_ISBN :
0-7695-2275-0
Type :
conf
DOI :
10.1109/HPCA.2005.9
Filename :
1385925
Link To Document :
بازگشت