• DocumentCode
    1436723
  • Title

    Prediction Router: A Low-Latency On-Chip Router Architecture with Multiple Predictors

  • Author

    Matsutani, Hiroki ; Koibuchi, Michihiro ; Amano, Hideharu ; Yoshinaga, Tsutomu

  • Author_Institution
    Dept. of Inf. Phys. & Comput., Univ. of Tokyo, Tokyo, Japan
  • Volume
    60
  • Issue
    6
  • fYear
    2011
  • fDate
    6/1/2011 12:00:00 AM
  • Firstpage
    783
  • Lastpage
    799
  • Abstract
    Multi and many-core applications are sensitive to interprocessor communication latencies, suggesting the need for low-latency on-chip networks. We propose a low-latency router architecture that predicts the output channel to be used by the next packet transfer and speculatively completes the switch arbitration to reduce communication latency. The packets coming into the prediction routers are transferred without waiting for the routing computation and switch arbitration if the prediction hits. Thus, the primary concern for reducing communication latency is the hit rates of the prediction algorithms, which vary based on network environments, such as the network topology, routing algorithm, and traffic pattern. Although typical low-latency routers that skip one or more pipeline stages use a bypass data path that is based on a static or single bypassing policy (e.g., accelerating the packets moving in the same dimension), our prediction router architecture predictively forwards packets based on the prediction algorithm selected from among several candidates in response to the network environment. We analyze the prediction hit rates of five prediction algorithms on meshes, tori, fat trees, and Spidergons. Then, we present four case studies, each of which assumes different many-core architectures. We implemented the prediction routers for each case study by using a 45 nm CMOS process, and evaluated them in terms of the prediction hit rate, zero-load latency, hardware amount, and energy consumption. A typical prediction router with two or three predictors shows that although the area and energy are increased by 4.8-12.0 percent and 5.3 percent, respectively, up to 89.8 percent of the prediction hit rate is achieved in real applications, which provides favorable trade-offs between modest hardware/energy overheads and significant latency saving.
  • Keywords
    CMOS integrated circuits; microprocessor chips; multiprocessing systems; network routing; network-on-chip; CMOS process; bypass data path; communication latency; energy consumption; interprocessor communication latencies; low-latency on-chip networks; low-latency on-chip router architecture; low-latency router architecture; low-latency routers; many-core applications; multicore applications; multiple predictors; network environments; network topology; packet transfer; prediction algorithm; prediction hit rate; prediction router architecture; prediction routers; routing algorithm; routing computation; single bypassing policy; static bypassing policy; switch arbitration; traffic pattern; zero-load latency; Computer architecture; Pipelines; Prediction algorithms; Routing; Switches; Switching circuits; System-on-a-chip; Interconnection networks; low-latency router architecture.; on-chip networks;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2011.17
  • Filename
    5703069