• DocumentCode
    2173008
  • Title

    Improving yield and reliability of chip multiprocessors

  • Author

    Pan, Abhisek ; Khan, Omer ; Kundu, Sandip

  • Author_Institution
    Univ. of Massachusetts, Amherst, MA
  • fYear
    2009
  • fDate
    20-24 April 2009
  • Firstpage
    490
  • Lastpage
    495
  • Abstract
    An increasing number of hardware failures can be attributed to device reliability problems that cause partial system failure or shutdown. In this paper we propose a scheme for improving reliability of a homogeneous chip multiprocessor (CMP) that also serves to improve manufacturing yield. Our solution centers on exploiting the natural redundancy that already exists in multi-core systems by using services from other cores for functional units that are defective in a faulty core. A micro-architectural modification allows a core on a CMP to use another core as a coprocessor to service any instruction that the former cannot execute correctly. This service is accessed to improve yield and reliability, but at the cost of some loss of performance. In order to quantify this loss we have used a cycle-accurate simulator to simulate the performance of a dual-core system with one or two cores sustaining partial failure. Our results indicate that when a large and sparingly-used unit such as a floating point arithmetic unit fails in a core, even for a floating point intensive benchmark, we can continue to run each faulty core with help from companion cores with as little as 10% impact to performance and less than 1% area overhead.
  • Keywords
    floating point arithmetic; integrated circuit reliability; integrated circuit yield; microprocessor chips; multiprocessing systems; cycle-accurate simulator; device reliability problem; dual-core system; faulty core; floating point arithmetic unit; hardware failure; homogeneous chip multiprocessor; manufacturing yield; microarchitectural modification; multicore system; natural redundancy; partial system failure; partial system shutdown; Costs; Frequency; Hardware; Niobium compounds; Performance loss; Redundancy; Stress; Temperature; Titanium compounds; Voltage; micorarchitecture; multiprocessors; reliability; yield;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design, Automation & Test in Europe Conference & Exhibition, 2009. DATE '09.
  • Conference_Location
    Nice
  • ISSN
    1530-1591
  • Print_ISBN
    978-1-4244-3781-8
  • Type

    conf

  • DOI
    10.1109/DATE.2009.5090714
  • Filename
    5090714