• DocumentCode
    2579206
  • Title

    The StageNet fabric for constructing resilient multicore systems

  • Author

    Gupta, Shantanu ; Feng, Shuguang ; Ansari, Amin ; Blome, Jason ; Mahlke, Scott

  • Author_Institution
    Adv. Comput. Archit. Lab., Univ. of Michigan, Ann Arbor, MI
  • fYear
    2008
  • fDate
    8-12 Nov. 2008
  • Firstpage
    141
  • Lastpage
    151
  • Abstract
    Scaling of CMOS feature size has long been a source of dramatic performance gains. However, the reduction in voltage levels has not been able to match this rate of scaling, leading to increasing operating temperatures and current densities. Given that most wearout mechanisms that plague semiconductor devices are highly dependent on these parameters, significantly higher failure rates are projected for future technology generations. Consequently, high reliability and fault tolerance, which have traditionally been subjects of interest for high-end server markets, are now getting emphasis in the mainstream desktop and embedded systems space. The popular solution for this has been the use of redundancy at a coarse granularity, such as dual/triple modular redundancy. In this work, we challenge the practice of coarse-granularity redundancy by identifying its inability to scale to high failure rate scenarios and investigating the advantages of finer-grained configurations. To this end, this paper presents and evaluates a highly reconfigurable multicore architecture, named StageNet (SN), that is designed with reliability as its first class design criteria. SN relies on a reconfigurable network of replicated processor pipeline stages to maximize the useful lifetime of a chip, gracefully degrading performance towards the end of life. Our results show that the proposed SN architecture can perform nearly 50% more cumulative work compared to a traditional multicore.
  • Keywords
    CMOS integrated circuits; fault tolerance; integrated circuit reliability; microprocessor chips; redundancy; CMOS feature size; StageNet fabric; coarse-granularity redundancy; fault tolerance; processor pipeline; reliability; resilient multicore systems; wearout mechanisms; Current density; Fabrics; Multicore processing; Performance gain; Redundancy; Semiconductor devices; Space technology; Temperature; Tin; Voltage;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Microarchitecture, 2008. MICRO-41. 2008 41st IEEE/ACM International Symposium on
  • Conference_Location
    Lake Como
  • ISSN
    1072-4451
  • Print_ISBN
    978-1-4244-2836-6
  • Electronic_ISBN
    1072-4451
  • Type

    conf

  • DOI
    10.1109/MICRO.2008.4771786
  • Filename
    4771786