DocumentCode
3428607
Title
Streamlining the continual flow processor architecture with fast replay loop
Author
Jothi, Komal ; Akkary, H.
Author_Institution
Dept. of Electr. & Comput. Eng., American Univ. of Beirut, Beirut, Lebanon
fYear
2013
fDate
1-4 July 2013
Firstpage
1821
Lastpage
1828
Abstract
We present a streamlined continual flow processor architecture for scheduling instructions behind loads that miss the data cache. Instructions that do not encounter cache misses execute quickly, releasing their allocated hardware resources for other instructions. Instructions that depend on data cache misses wait in their reservation stations for the data, as long as the reservation station resources are not full. If the reservation stations become full blocking the pipeline, instructions dependent on cache misses give back their reservation stations and move directly into a large single-ported SRAM waiting buffer without having to go through pseudo execution and commit in the reorder buffer, as required by previous continual flow architectures. Afterwards, when the miss data cache block is fetched, these instructions are replayed from the waiting buffer, i.e., re-inserted again into the reservation stations to be scheduled for execution. Shortening the replay loop by removing the reorder buffer and the pseudo execute and commit from the replay path, improves performance on benchmarks with large number of loads that miss the L1 but hit the on-chip L2 data cache. Performance measurements using the SimpleScalar microarchitecture simulator and Spec 2006 benchmarks show that our streamlined continual flow pipeline architecture outperforms conventional continual flow pipeline architecture by 16% on average.
Keywords
SRAM chips; cache storage; pipeline processing; SimpleScalar microarchitecture simulator; Spec 2006 benchmarks; data cache block; fast replay loop; hardware resources; onchip L2 data cache; pseudo execution; reservation station; scheduling instructions; single ported SRAM; streamlined continual flow pipeline architecture; streamlined continual flow processor architecture; Arrays; Benchmark testing; Hardware; Pipelines; Random access memory; Registers; continual flow pipelines; instruction level parallelism; latency tolerant processors; superscalar processors; virtual register renaming;
fLanguage
English
Publisher
ieee
Conference_Titel
EUROCON, 2013 IEEE
Conference_Location
Zagreb
Print_ISBN
978-1-4673-2230-0
Type
conf
DOI
10.1109/EUROCON.2013.6625224
Filename
6625224
Link To Document