• DocumentCode
    2890622
  • Title

    On graceful degradation of microprocessors in presence of faults via resource banking

  • Author

    Rodrigues, Rance ; Kundu, Sandip

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Massachusetts at Amherst, Amherst, MA, USA
  • fYear
    2011
  • fDate
    13-15 July 2011
  • Firstpage
    61
  • Lastpage
    66
  • Abstract
    Reliability and manufacturability have emerged as dominant concerns for today´s multi-billion transistor chips. In this paper, we investigate how to degrade a chip multiprocessor (CMP) gracefully in presence of faults, by keeping its architected functionality intact at the expense of some loss of performance. The proposed solution involves banking resources and functional units that allow partial shutdown to isolate faulty regions. Within a processor chip, resources can be classified into three broad classes, namely the large memory structures such as caches, TLB, register file, the small memory structures namely the reorder buffer, issue queue, load-store buffer etc. and the datapath and control logic. The large arrays are usually well protected by ECC. Recent research has suggested that large datapath units such as FPU and integer division units are good candidates for execution outsourcing to other working cores in CMP. In this paper, we focus on relatively small but critically important integer ALU unit and small array structures. Outsourcing ALU operations incur large performance penalty and small arrays are critical for control operations. The proposed solution is based on banking of array structures and execution units as well as reducing instruction fetch, issue and retire widths that allow individual bank(s) to be disabled for operation. Reduction in resource size diminishes performance but retains functionality. In this paper we present performance degradation data for shutting down one or more banks for various small array structures and ALU units. Simulation confirms gradual degradation with diminishing resource sizes. On average over all considered structures, performance loss of just 6% was observed for single bank failures.
  • Keywords
    microprocessor chips; ALU unit; CMP; ECC; FPU; TLB; chip multiprocessor; integer division units; load-store buffer; microprocessor graceful degradation; multibillion transistor chips; register file; resource banking; Benchmark testing; Degradation; Hardware; Logic arrays; Pipelines; Process control; Registers; Reliability; critical instruction execution units; fault tolerance; hardware de-configuration; performance impact;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    On-Line Testing Symposium (IOLTS), 2011 IEEE 17th International
  • Conference_Location
    Athens
  • Print_ISBN
    978-1-4577-1053-7
  • Type

    conf

  • DOI
    10.1109/IOLTS.2011.5993812
  • Filename
    5993812