• DocumentCode
    1243186
  • Title

    Beating in-order stalls with "flea-flicker" two-pass pipelining

  • Author

    Barnes, Ronald D. ; Sias, John W. ; Nystrom, Erik M. ; Patel, Sanjay J. ; Navarro, Jose ; Hwu, Wen-Mei W.

  • Author_Institution
    George Mason Univ., Fairfax, VA, USA
  • Volume
    55
  • Issue
    1
  • fYear
    2006
  • Firstpage
    18
  • Lastpage
    33
  • Abstract
    While compilers have generally proven adept at planning useful static instruction-level parallelism for in-order microarchitectures, the efficient accommodation of unanticipateable latencies, like those of load instructions, remains a vexing problem. Traditional out-of-order execution hides some of these latencies, but repeats scheduling work already done by the compiler and adds additional pipeline overhead. Other techniques, such as prefetching and multithreading, can hide some anticipateable, long-latency misses, but not the shorter, more diffuse stalls due to difficult-to-anticipate, first or second-level misses. Our work proposes a microarchitectural technique, two-pass pipelining, whereby the program executes on two in-order back-end pipelines coupled by a queue. The "advance" pipeline often defers instructions dispatching with unready operands rather than stalling. The "backup" pipeline allows concurrent resolution of instructions deferred by the first pipeline allowing overlapping of useful "advanced" execution with miss resolution. An accompanying compiler technique and instruction marking further enhance the handling of miss latencies. Applying our technique to an Itanium 2-like design achieves a speedup of 1.38x in mcf, the most memory-intensive SPECint2000 benchmark, and an average of 1.12 x across other selected benchmarks, yielding between 32 percent and 67 percent of an idealized out-of-order design\´s speedup at a much lower design cost and complexity.
  • Keywords
    multi-threading; parallel architectures; parallelising compilers; pipeline processing; storage management; Itanium 2-like design; compiler technique; in-order microarchitecture; instructions dispatching; memory-intensive SPECint2000 benchmark; multithreading; prefetching; static instruction-level parallelism; two-pass pipelining; Delay; Dynamic scheduling; Microarchitecture; Out of order; Parallel processing; Pipeline processing; Prefetching; Processor scheduling; Registers; Runtime; Index Terms- Runahead execution; cache-miss tolerance.; out-of-order execution; prefetching;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2006.4
  • Filename
    1545748