Title :
HEFT: A hybrid system-level framework for enabling energy-efficient fault-tolerance in NoC based MPSoCs
Author :
Yong Zou ; Pasricha, Sudeep
Author_Institution :
Dept. of Electr. & Comput. Eng., Colorado State Univ., Fort Collins, CO, USA
Abstract :
In emerging CMOS process technologies, network-on-chip (NoC) fabrics are increasingly becoming susceptible to transient faults. Fault-tolerance mechanisms that are typically employed in NoCs usually entail significant energy overheads that are expected to become prohibitive as fault rates increase in future CMOS technologies. We propose a system-level framework called HEFT to trade-off energy consumption and fault-tolerance in the NoC fabric. Our hybrid framework tackles the challenge of enabling energy-efficient resilience in NoCs in two phases: at design time and at runtime. At design time, we implement an algorithm to guide the robust mapping of cores on to a die while satisfying application bandwidth and latency constraints. At runtime we devise a prediction algorithm to monitor and detect changes in fault susceptibility of NoC components, to intelligently balance energy consumption and reliability. Experimental results show that HEFT improves energy/reliability ratio of synthesized solutions by 8-20%, while meeting application performance goals, when compared to multiple prior works on reliable system-level NoC design.
Keywords :
CMOS integrated circuits; energy conservation; energy consumption; fault tolerance; integrated circuit design; integrated circuit reliability; multiprocessing systems; network-on-chip; transient analysis; CMOS process technology; CMOS technology; HEFT; NoC based MPSoC; NoC component; NoC fabric; application bandwidth; energy consumption; energy-efficient fault-tolerance; energy-efficient resilience; fault rate; fault susceptibility; fault-tolerance mechanism; hybrid system-level framework; latency constraint; network-on-chip fabric; prediction algorithm; reliability; reliable system-level NoC design; robust mapping; transient fault; Bandwidth; Fault tolerance; Fault tolerant systems; Reliability engineering; Runtime; Tunneling magnetoresistance; System-level design; fault-tolerance; networks-on-chip;
Conference_Titel :
Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2014 International Conference on
Conference_Location :
New Delhi
DOI :
10.1145/2656075.2656087