• DocumentCode
    1350544
  • Title

    Maximizing Spare Utilization by Virtually Reorganizing Faulty Cache Lines

  • Author

    Ansari, Amin ; Gupta, Shantanu ; Feng, Shuguang ; Mahlke, Scott

  • Author_Institution
    Comput. Sci. & Eng. Dept., Univ. of Michigan, Ann Arbor, MI, USA
  • Volume
    60
  • Issue
    1
  • fYear
    2011
  • Firstpage
    35
  • Lastpage
    49
  • Abstract
    Aggressive technology scaling to 45 nm and below introduces serious reliability challenges to the design of microprocessors. Since a large fraction of chip area is devoted to on-chip caches, it is important to protect these SRAM structures against lifetime and manufacture-time failures. Designers typically overprovision caches with additional resources to overcome hard faults. However, static allocation and binding of redundant spares results in low utilization of the extra resources and ultimately limits the number of defects that can be tolerated. This work re-examines the design of process-variation-tolerant on-chip caches with a focus on providing the flexibility and dynamic reconfigurability necessary to tolerate large numbers of defects with modest hardware overhead. Our approach, ZerehCache, virtually reorganizes the cache data array using a permutation network to provide more degrees of freedom for spare allocation. A graph coloring algorithm is used to configure the network and identify the proper mapping of replacement elements. We perform an extensive design space exploration of both L1/L2 caches to identify several Pareto-optimal ZerehCaches. Given these optimal design points, we employ ZerehCache to extend the effective lifetime of the on-chip caches and prevent early lifetime failures. Finally, yield analysis studies performed on a population of 1,000 chips at the 45 nm technology node demonstrated that an L1 design with 16 percent overhead and an L2 design with eight percent area overhead achieve yields of 99 percent and 96 percent, respectively.
  • Keywords
    Pareto optimisation; SRAM chips; cache storage; fault tolerant computing; graph colouring; microprocessor chips; resource allocation; virtual storage; Pareto-optimal ZerehCaches; SRAM structures; aggressive technology; cache data array; faulty cache lines; graph coloring algorithm; lifetime failures; microprocessors; on-chip caches; reliability; resource allocation; virtual reorganization; Arrays; Manufacturing; Random access memory; Redundancy; System-on-a-chip; Transistors; Process variation; fault-tolerant cache memories; manufacturing yield.; wearout;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2010.204
  • Filename
    5601696