• DocumentCode
    3428607
  • Title

    Streamlining the continual flow processor architecture with fast replay loop

  • Author

    Jothi, Komal ; Akkary, H.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., American Univ. of Beirut, Beirut, Lebanon
  • fYear
    2013
  • fDate
    1-4 July 2013
  • Firstpage
    1821
  • Lastpage
    1828
  • Abstract
    We present a streamlined continual flow processor architecture for scheduling instructions behind loads that miss the data cache. Instructions that do not encounter cache misses execute quickly, releasing their allocated hardware resources for other instructions. Instructions that depend on data cache misses wait in their reservation stations for the data, as long as the reservation station resources are not full. If the reservation stations become full blocking the pipeline, instructions dependent on cache misses give back their reservation stations and move directly into a large single-ported SRAM waiting buffer without having to go through pseudo execution and commit in the reorder buffer, as required by previous continual flow architectures. Afterwards, when the miss data cache block is fetched, these instructions are replayed from the waiting buffer, i.e., re-inserted again into the reservation stations to be scheduled for execution. Shortening the replay loop by removing the reorder buffer and the pseudo execute and commit from the replay path, improves performance on benchmarks with large number of loads that miss the L1 but hit the on-chip L2 data cache. Performance measurements using the SimpleScalar microarchitecture simulator and Spec 2006 benchmarks show that our streamlined continual flow pipeline architecture outperforms conventional continual flow pipeline architecture by 16% on average.
  • Keywords
    SRAM chips; cache storage; pipeline processing; SimpleScalar microarchitecture simulator; Spec 2006 benchmarks; data cache block; fast replay loop; hardware resources; onchip L2 data cache; pseudo execution; reservation station; scheduling instructions; single ported SRAM; streamlined continual flow pipeline architecture; streamlined continual flow processor architecture; Arrays; Benchmark testing; Hardware; Pipelines; Random access memory; Registers; continual flow pipelines; instruction level parallelism; latency tolerant processors; superscalar processors; virtual register renaming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    EUROCON, 2013 IEEE
  • Conference_Location
    Zagreb
  • Print_ISBN
    978-1-4673-2230-0
  • Type

    conf

  • DOI
    10.1109/EUROCON.2013.6625224
  • Filename
    6625224