Title :
Streamlining the continual flow processor architecture with fast replay loop
Author :
Jothi, Komal ; Akkary, H.
Author_Institution :
Dept. of Electr. & Comput. Eng., American Univ. of Beirut, Beirut, Lebanon
Abstract :
We present a streamlined continual flow processor architecture for scheduling instructions behind loads that miss the data cache. Instructions that do not encounter cache misses execute quickly, releasing their allocated hardware resources for other instructions. Instructions that depend on data cache misses wait in their reservation stations for the data, as long as the reservation station resources are not full. If the reservation stations become full blocking the pipeline, instructions dependent on cache misses give back their reservation stations and move directly into a large single-ported SRAM waiting buffer without having to go through pseudo execution and commit in the reorder buffer, as required by previous continual flow architectures. Afterwards, when the miss data cache block is fetched, these instructions are replayed from the waiting buffer, i.e., re-inserted again into the reservation stations to be scheduled for execution. Shortening the replay loop by removing the reorder buffer and the pseudo execute and commit from the replay path, improves performance on benchmarks with large number of loads that miss the L1 but hit the on-chip L2 data cache. Performance measurements using the SimpleScalar microarchitecture simulator and Spec 2006 benchmarks show that our streamlined continual flow pipeline architecture outperforms conventional continual flow pipeline architecture by 16% on average.
Keywords :
SRAM chips; cache storage; pipeline processing; SimpleScalar microarchitecture simulator; Spec 2006 benchmarks; data cache block; fast replay loop; hardware resources; onchip L2 data cache; pseudo execution; reservation station; scheduling instructions; single ported SRAM; streamlined continual flow pipeline architecture; streamlined continual flow processor architecture; Arrays; Benchmark testing; Hardware; Pipelines; Random access memory; Registers; continual flow pipelines; instruction level parallelism; latency tolerant processors; superscalar processors; virtual register renaming;
Conference_Titel :
EUROCON, 2013 IEEE
Conference_Location :
Zagreb
Print_ISBN :
978-1-4673-2230-0
DOI :
10.1109/EUROCON.2013.6625224