• DocumentCode
    2010544
  • Title

    A Two-Level Load/Store Queue Based on Execution Locality

  • Author

    Pericas, Miquel ; Cristal, Adrian ; Cazorla, Francisco J. ; Gonzalez, R. ; Veidenbaum, Alex ; Jimenez, D.A. ; Valero, Mateo

  • Author_Institution
    Univ. Politec. de Catalunya, Barcelona
  • fYear
    2008
  • fDate
    21-25 June 2008
  • Firstpage
    25
  • Lastpage
    36
  • Abstract
    Multicore processors have emerged as a powerful platform on which to efficiently exploit thread-level parallelism (TLP). However, due to Amdahlpsilas law, such designs will be increasingly limited by the remaining sequential components of applications. To overcome this limitation it is necessary to design processors with many lower-performance cores for TLP and some high-performance cores designed to execute sequential algorithms. Such cores will need to address the memory-wall by implementing kilo-instruction windows. Large window processors require large load/store queues that would be too slow if implemented using current CAM-based designs. This paper proposes an epoch-based load store queue (ELSQ), a new design based on execution locality. It is integrated into a large-window processor that has a fast, out-of-order core operating only on L1/L2 cache hits and N slower cores that process L2 misses and their dependent instructions. The large LSQ is coupled with the slow cores and is partitioned into N small and local LSQs, one per core. We evaluate ELSQ in a large-window environment, finding that it enables high performance at low power. By exploiting locality among loads and stores, ELSQ outperforms even an idealized central LSQ when implemented on top of a decoupled processor design.
  • Keywords
    cache storage; parallel processing; Amdahls law; CAM-based designs; L1 cache; L2 cache; epoch-based load store queue; execution locality; kilo-instruction windows; memory-wall; multicore processors; processor design; sequential algorithms; thread-level parallelism; Algorithm design and analysis; Bandwidth; Computer architecture; Energy efficiency; Failure analysis; Filtering; Multicore processing; Out of order; Process design; Proposals; Execution Locality; Kilo-Instruction Processors; Load/Store Queue; Power-Efficiency;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture, 2008. ISCA '08. 35th International Symposium on
  • Conference_Location
    Beijing
  • ISSN
    1063-6897
  • Print_ISBN
    978-0-7695-3174-8
  • Type

    conf

  • DOI
    10.1109/ISCA.2008.10
  • Filename
    4556713