• DocumentCode
    1760682
  • Title

    Control-Flow Decoupling: An Approach for Timely, Non-Speculative Branching

  • Author

    Sheikh, Rami ; Tuck, James ; Rotenberg, Eric

  • Author_Institution
    Qualcomm Res., Raleigh, NC, USA
  • Volume
    64
  • Issue
    8
  • fYear
    2015
  • fDate
    Aug. 1 2015
  • Firstpage
    2182
  • Lastpage
    2203
  • Abstract
    Mobile and PC/server class processor companies continue to roll out flagship core microarchitectures that are faster than their predecessors. Meanwhile placing more cores on a chip coupled with constant supply voltage puts per-core energy consumption at a premium. Hence, the challenge is to find future microarchitecture optimizations that not only increase performance but also conserve energy. Eliminating branch mispredictions-which waste both time and energy-is valuable in this respect. In this paper, we explore the control-flow landscape by characterizing mispredictions in four benchmark suites. We find that a third of mispredictions-per-1K-instructions (MPKI) come from what we call separable branches: branches with large control-dependent regions (not suitable for if-conversion), whose backward slices do not depend on their control-dependent instructions or have only a short dependence. We propose control-flow decoupling (CFD) to eradicate mispredictions of separable branches. The idea is to separate the loop containing the branch into two loops: the first contains only the branch´s predicate computation and the second contains the branch and its control-dependent instructions. The first loop communicates branch outcomes to the second loop through an architectural queue. Microarchitecturally, the queue resides in the fetch unit to drive timely, non-speculative branching. On a microarchitecture configured similar to Intel´s Sandy Bridge core, CFD increases performance by up to 55 percent, and reduces energy consumption by up to 49 percent (for CFD regions). Moreover, for some applications, CFD is a necessary catalyst for future complexity-effective large-window architectures to tolerate memory latency.
  • Keywords
    computer architecture; energy conservation; microprocessor chips; power aware computing; CFD; Intel´s Sandy Bridge core; MPKI; PC/server class processor companies; architectural queue; complexity-effective large-window architectures; control-dependent instructions; control-flow decoupling; control-flow landscape; flagship core microarchitectures; memory latency; microarchitecture optimizations; mispredictions-per-1K-instructions; mobile companies; nonspeculative branching; per-core energy consumption; supply voltage; Computational fluid dynamics; Energy consumption; Ground penetrating radar; Hardware; Microarchitecture; Multicore processing; Software; Microarchitecture; branch prediction; instruction level parallelism; isa extensions; pre-execution; predication; separable branches; software/hardware codesign;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2014.2361526
  • Filename
    6915862