• DocumentCode
    2480881
  • Title

    SOLE: Speculative one-cycle load execution with scalability, high-performance and energy-efficiency

  • Author

    Zhenhao Zhang ; Dong Tong ; Xiaoyin Wang ; Jiangfang Yi ; Keyi Wang

  • Author_Institution
    Microprocessor R&D Center, Peking Univ., Beijing, China
  • fYear
    2012
  • fDate
    Sept. 30 2012-Oct. 3 2012
  • Firstpage
    291
  • Lastpage
    296
  • Abstract
    Conventional superscalar processors usually contain large CAM-based LSQ (load/store queue) with poor scalability and high energy consumption. Recently proposals only focus on improving the LSQ scalability to increase the in-flight instruction capacity, but with poor performance improvement and energy efficiency. This paper presents a novel speculative store-load forwarding mechanism, named SOLE (speculative one-cycle load execution)1. Firstly, SOLE uses address identifiers to determine the memory disambiguation, rather than the exact memory addresses as the traditional LSQ does. Since the address identifier is just simple hash from the address base and offset, the speculative store-load forwarding could be advanced earlier to reduce the load execution latency and avoid unnecessary energy consumption by filtering unnecessary accesses to the data cache. Secondly, SOLE enlarges the forwarding communication range by using SSN (store sequential number) to determine the age order between stores, which further improves the performance. Finally, the implementation of SOLE all uses set-associative structures that avoid the non-scalable problem of CAM-based LSQ. Experiments show that performance of SOLE outperforms the traditional LSQ by 13.57% in terms of performance, with only 75.2% execution energy consumption of the loads and stores.
  • Keywords
    cache storage; energy conservation; energy consumption; performance evaluation; LSQ scalability; SOLE mechanism; SSN; address identifiers; age order; data cache; energy efficiency; forwarding communication range; in-flight instruction capacity; load execution latency reduce; load-store queue; memory disambiguation; performance improvement; set-associative structures; speculative one-cycle load execution; store sequential number; store-load forwarding mechanism; Benchmark testing; Energy consumption; Optimization; Pipelines; Program processors; Proposals; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Design (ICCD), 2012 IEEE 30th International Conference on
  • Conference_Location
    Montreal, QC
  • ISSN
    1063-6404
  • Print_ISBN
    978-1-4673-3051-0
  • Type

    conf

  • DOI
    10.1109/ICCD.2012.6378654
  • Filename
    6378654