• DocumentCode
    2136440
  • Title

    Dual-core execution: building a highly scalable single-thread instruction window

  • Author

    Zhou, Huiyang

  • Author_Institution
    Sch. of Comput. Sci., Central Florida Univ., Orlando, FL, USA
  • fYear
    2005
  • fDate
    17-21 Sept. 2005
  • Firstpage
    231
  • Lastpage
    242
  • Abstract
    Current integration trends embrace the prosperity of single-chip multi-core processors. Although multi-core processors deliver significantly improved system throughput, single-thread performance is not addressed. In this paper, we propose a new execution paradigm that utilizes multi-cores on a single chip collaboratively to achieve high performance for single-thread memory-intensive workloads while maintaining the flexibility to support multithreaded applications. The proposed execution paradigm, dual-core execution, consists of two superscalar cores (a front and back processor) coupled with a queue. The front processor fetches and preprocesses instruction streams and retires processed instructions into the queue for the back processor to consume. The front processor executes instructions as usual except for cache-missing loads, which produce an invalid value instead of blocking the pipeline. As a result, the front processor runs far ahead to warm up the data caches and fix branch mispredictions for the back processor. In-flight instructions are distributed in the front processor, the queue, and the back processor, forming a very large instruction window for single-thread out-of-order execution. The proposed architecture incurs only minor hardware changes and does not require any large centralized structures such as large register files, issue queues, load/store queues, or reorder buffers. Experimental results show remarkable latency hiding capabilities of the proposed architecture, even outperforming more complex single-thread processors with much larger instruction windows than the front or back processor.
  • Keywords
    cache storage; multi-threading; multiprocessing systems; dual-core execution; highly scalable single-thread instruction window; issue queues; load-store queues; multithreaded applications; register files; reorder buffers; single-chip multicore processors; single-thread memory-intensive workloads; single-thread out-of-order execution; Buffer storage; Collaborative work; Delay; Hardware; Multicore processing; Out of order; Pipelines; Registers; Throughput; Windows;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures and Compilation Techniques, 2005. PACT 2005. 14th International Conference on
  • ISSN
    1089-795X
  • Print_ISBN
    0-7695-2429-X
  • Type

    conf

  • DOI
    10.1109/PACT.2005.18
  • Filename
    1515596