Title :
On graceful degradation of microprocessors in presence of faults via resource banking
Author :
Rodrigues, Rance ; Kundu, Sandip
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Massachusetts at Amherst, Amherst, MA, USA
Abstract :
Reliability and manufacturability have emerged as dominant concerns for today´s multi-billion transistor chips. In this paper, we investigate how to degrade a chip multiprocessor (CMP) gracefully in presence of faults, by keeping its architected functionality intact at the expense of some loss of performance. The proposed solution involves banking resources and functional units that allow partial shutdown to isolate faulty regions. Within a processor chip, resources can be classified into three broad classes, namely the large memory structures such as caches, TLB, register file, the small memory structures namely the reorder buffer, issue queue, load-store buffer etc. and the datapath and control logic. The large arrays are usually well protected by ECC. Recent research has suggested that large datapath units such as FPU and integer division units are good candidates for execution outsourcing to other working cores in CMP. In this paper, we focus on relatively small but critically important integer ALU unit and small array structures. Outsourcing ALU operations incur large performance penalty and small arrays are critical for control operations. The proposed solution is based on banking of array structures and execution units as well as reducing instruction fetch, issue and retire widths that allow individual bank(s) to be disabled for operation. Reduction in resource size diminishes performance but retains functionality. In this paper we present performance degradation data for shutting down one or more banks for various small array structures and ALU units. Simulation confirms gradual degradation with diminishing resource sizes. On average over all considered structures, performance loss of just 6% was observed for single bank failures.
Keywords :
microprocessor chips; ALU unit; CMP; ECC; FPU; TLB; chip multiprocessor; integer division units; load-store buffer; microprocessor graceful degradation; multibillion transistor chips; register file; resource banking; Benchmark testing; Degradation; Hardware; Logic arrays; Pipelines; Process control; Registers; Reliability; critical instruction execution units; fault tolerance; hardware de-configuration; performance impact;
Conference_Titel :
On-Line Testing Symposium (IOLTS), 2011 IEEE 17th International
Conference_Location :
Athens
Print_ISBN :
978-1-4577-1053-7
DOI :
10.1109/IOLTS.2011.5993812