• DocumentCode
    169680
  • Title

    HEFT: A hybrid system-level framework for enabling energy-efficient fault-tolerance in NoC based MPSoCs

  • Author

    Yong Zou ; Pasricha, Sudeep

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Colorado State Univ., Fort Collins, CO, USA
  • fYear
    2014
  • fDate
    12-17 Oct. 2014
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    In emerging CMOS process technologies, network-on-chip (NoC) fabrics are increasingly becoming susceptible to transient faults. Fault-tolerance mechanisms that are typically employed in NoCs usually entail significant energy overheads that are expected to become prohibitive as fault rates increase in future CMOS technologies. We propose a system-level framework called HEFT to trade-off energy consumption and fault-tolerance in the NoC fabric. Our hybrid framework tackles the challenge of enabling energy-efficient resilience in NoCs in two phases: at design time and at runtime. At design time, we implement an algorithm to guide the robust mapping of cores on to a die while satisfying application bandwidth and latency constraints. At runtime we devise a prediction algorithm to monitor and detect changes in fault susceptibility of NoC components, to intelligently balance energy consumption and reliability. Experimental results show that HEFT improves energy/reliability ratio of synthesized solutions by 8-20%, while meeting application performance goals, when compared to multiple prior works on reliable system-level NoC design.
  • Keywords
    CMOS integrated circuits; energy conservation; energy consumption; fault tolerance; integrated circuit design; integrated circuit reliability; multiprocessing systems; network-on-chip; transient analysis; CMOS process technology; CMOS technology; HEFT; NoC based MPSoC; NoC component; NoC fabric; application bandwidth; energy consumption; energy-efficient fault-tolerance; energy-efficient resilience; fault rate; fault susceptibility; fault-tolerance mechanism; hybrid system-level framework; latency constraint; network-on-chip fabric; prediction algorithm; reliability; reliable system-level NoC design; robust mapping; transient fault; Bandwidth; Fault tolerance; Fault tolerant systems; Reliability engineering; Runtime; Tunneling magnetoresistance; System-level design; fault-tolerance; networks-on-chip;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2014 International Conference on
  • Conference_Location
    New Delhi
  • Type

    conf

  • DOI
    10.1145/2656075.2656087
  • Filename
    6971820