Title :
Scalable Load and Store Processing in Latency-Tolerant Processors
Author :
Gandhi, Amit ; Akkary, Haitham ; Rajwar, Ravi ; Srinivasan, Srikanth T. ; Lai, Konrad
Author_Institution :
Platform Validation & Enabling Group, Intel Corp., Hillsboro, OR
Abstract :
Memory latency tolerant architectures achieve high performance by supporting thousands of in-flight instructions without scaling cycle-critical processor resources. We present new load-store processing algorithms for latency tolerant architectures. We augment primary load and store queues with secondary buffers. The secondary load buffer is a set associative structure, similar to a cache. The secondary store queue, the store redo log (SRL) is a first-in first-out (FIFO) structure recording the program order of all stores completed in parallel with a miss, and has no CAM and search functions. Instead of the secondary store queue, a cache provides temporary forwarding. The SRL enforces memory ordering by ensuring memory updates occur in program order once the miss data arrives from memory. The new algorithms remove fundamental sources of power, and area inefficiency in load and store processing by eliminating the CAM and search functions in the secondary load and store buffers, and still achieve competitive performance compared to hierarchical designs
Keywords :
cache storage; data structures; instruction sets; memory architecture; multiprocessing systems; first-in first-out structure; in-flight instructions; latency-tolerant processors; load-store processing algorithms; memory latency tolerant architectures; primary load queue; primary store queue; secondary load buffer; secondary store queue; store redo log; Algorithm design and analysis; Buffer storage; CADCAM; Computer aided manufacturing; Delay; Memory architecture; Pipelines; Processor scheduling; Proposals; Registers; CAM; Latency-tolerant processors; load and store;
Journal_Title :
Micro, IEEE