DocumentCode :
3209250
Title :
Scalable load and store processing in latency tolerant processors
Author :
Gandhi, Amit ; Akkary, Haitham ; Rajwar, Ravi ; Srinivasasn, S.T. ; Lai, Konrad
Author_Institution :
Electr. & Comput. Eng., Portland State Univ., OH, USA
fYear :
2005
fDate :
4-8 June 2005
Firstpage :
446
Lastpage :
457
Abstract :
Memory latency tolerant architectures support thousands of in-flight instructions without scaling cycle-critical processor resources, and thousands of useful instructions can complete in parallel with a miss to memory. These architectures however require large queues to track all loads and stores executed while a miss is pending. Hierarchical designs alleviate cycle time impact of these structures but the CAM and search functions required to enforce memory ordering and provide data forwarding place high demand on area and power. We present new load-store processing algorithms for latency tolerant architectures. We augment primary load and store queues with secondary buffers. The secondary load buffer is a set associative structure, similar to a cache. The secondary store buffer, the Store Redo Log, is a first-in first-out structure recording the program order of all stores completed in parallel with a miss, and has no CAM and search functions. Instead of the secondary store queue, a cache provides temporary forwarding. The SRL enforces memory ordering by ensuring memory updates occur in program order once the miss returns. The new algorithms eliminate the CAM and search functions in the secondary load and store buffers, and remove fundamental sources of complexity, power, and area inefficiency in load/store processing. The new organization, while being area and power efficient, is competitive in performance compared to hierarchical designs.
Keywords :
cache storage; computer aided manufacturing; instruction sets; memory architecture; parallel processing; program processors; queueing theory; CAM; Store Redo Log; cycle-critical processor resources; data forwarding; latency tolerant processors; load-store processing algorithms; memory latency tolerant architectures; memory ordering; search functions; secondary load buffer; secondary store queue; set associative structure; Buffer storage; CADCAM; Computer aided manufacturing; Computer architecture; Delay; Memory architecture; Pipelines; Processor scheduling; Proposals; Registers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Architecture, 2005. ISCA '05. Proceedings. 32nd International Symposium on
ISSN :
1063-6897
Print_ISBN :
0-7695-2270-X
Type :
conf
DOI :
10.1109/ISCA.2005.46
Filename :
1431577
Link To Document :
بازگشت