DocumentCode :
2890646
Title :
On graceful degradation of chip multiprocessors in presence of faults via flexible pooling of critical execution units
Author :
Rodrigues, Rance ; Kundu, Sandip
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Massachusetts at Amherst, Amherst, MA, USA
fYear :
2011
fDate :
13-15 July 2011
Firstpage :
67
Lastpage :
72
Abstract :
Reliability and manufacturability have emerged as dominant concerns for today´s multi-billion transistor chips. In this paper, we investigate how to degrade a chip multiprocessor (CMP) gracefully in presence of faults, by keeping its architected functionality intact at the expense of some loss of performance. The proposed solution involves sharing critical execution resources among cores to survive faults. Recent research has suggested that large datapath units such as FPU and integer division units are good candidates for execution outsourcing to other working cores in CMP. In this paper, we focus on relatively small but critically important integer ALU unit. Outsourcing ALU operations incur large performance penalty and better solutions need to be in place to ensure survivability with minimal performance loss. We propose the provisioning of a shared ALU among a set of cores that can act as a spare for any constituent core in the group. This solution works well for single ALU failures, but leads to resource contention when multiple ALUs fail. Simulation case studies on MediaBench and MiBench benchmarks show that the proposed solution allows the CMP to remain functionally intact with no performance penalty for single ALU failures and no more than 1.5% performance loss on average for failure of single ALU in each core.
Keywords :
digital arithmetic; fault tolerance; flexible electronics; integrated circuit reliability; microprocessor chips; multiprocessing systems; ALU failure; CMP; MediaBench benchmarks; MiBench benchmarks; chip multiprocessor degradation; critical execution resource; critical execution units; datapath units; fault tolerance; flexible pooling; integer ALU unit; outsourcing ALU operation; transistor chip manufacturability; transistor chip reliability; Benchmark testing; Hardware; Multicore processing; Outsourcing; Radiation detectors; Redundancy; Reliability; critical instruction execution unit; dynamic hardware sharing; fault tolerance; performance impact;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
On-Line Testing Symposium (IOLTS), 2011 IEEE 17th International
Conference_Location :
Athens
Print_ISBN :
978-1-4577-1053-7
Type :
conf
DOI :
10.1109/IOLTS.2011.5993813
Filename :
5993813
Link To Document :
بازگشت