• DocumentCode
    1241191
  • Title

    A Flexible Software-Based Framework for Online Detection of Hardware Defects

  • Author

    Constantinides, Kypros ; Mutlu, Onur ; Austin, Todd ; Bertacco, Valeria

  • Author_Institution
    Univ. of Michigan, Ann Arbor, MI, USA
  • Volume
    58
  • Issue
    8
  • fYear
    2009
  • Firstpage
    1063
  • Lastpage
    1079
  • Abstract
    This work proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instructions, called access-control extensions (ACE), that can access and control the microprocessor´s internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hardware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfiguration. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade-off performance with reliability without requiring any change to the hardware. We describe and evaluate different execution models for using the ACE framework. We also describe how the proposed ACE framework can be extended and utilized to improve the quality of post-silicon debugging and manufacturing testing of modern processors. We evaluated our technique on a commercial chip-multiprocessor based on Sun´s Niagara and found that it can provide very high coverage, with 99.22 percent of all silicon defects detected. Moreover, our results show that the average performance overhead of software-based testing is only 5.5 percent. Based on a detailed register transfer level (RTL) implementation of our technique, we find its area and power consumption overheads to be modest, with a 5.8 percent increase in total chip area and a 4 percent increase in the chip´s overall power consumption.
  • Keywords
    computer debugging; electronic engineering computing; fault diagnosis; firmware; logic testing; microprocessor chips; multiprocessing systems; reliability; silicon; access-control extension; chip-multiprocessor; hardware defect online detection; post-silicon debugging; register transfer level; software-based defect detection; Debugging; Energy consumption; Hardware; Manufacturing processes; Microprocessors; Microprogramming; Software performance; Software testing; Sun; System testing; Reliability; hardware defects; manufacturing test.; online defect detection; online self-test; post-silicon debugging; testing;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2009.52
  • Filename
    4815209