Title :
A Flexible Software-Based Framework for Online Detection of Hardware Defects
Author :
Constantinides, Kypros ; Mutlu, Onur ; Austin, Todd ; Bertacco, Valeria
Author_Institution :
Univ. of Michigan, Ann Arbor, MI, USA
Abstract :
This work proposes a new, software-based, defect detection and diagnosis technique. We introduce a novel set of instructions, called access-control extensions (ACE), that can access and control the microprocessor´s internal state. Special firmware periodically suspends microprocessor execution and uses the ACE instructions to run directed tests on the hardware. When a hardware defect is present, these tests can diagnose and locate it, and then activate system repair through resource reconfiguration. The software nature of our framework makes it flexible: testing techniques can be modified/upgraded in the field to trade-off performance with reliability without requiring any change to the hardware. We describe and evaluate different execution models for using the ACE framework. We also describe how the proposed ACE framework can be extended and utilized to improve the quality of post-silicon debugging and manufacturing testing of modern processors. We evaluated our technique on a commercial chip-multiprocessor based on Sun´s Niagara and found that it can provide very high coverage, with 99.22 percent of all silicon defects detected. Moreover, our results show that the average performance overhead of software-based testing is only 5.5 percent. Based on a detailed register transfer level (RTL) implementation of our technique, we find its area and power consumption overheads to be modest, with a 5.8 percent increase in total chip area and a 4 percent increase in the chip´s overall power consumption.
Keywords :
computer debugging; electronic engineering computing; fault diagnosis; firmware; logic testing; microprocessor chips; multiprocessing systems; reliability; silicon; access-control extension; chip-multiprocessor; hardware defect online detection; post-silicon debugging; register transfer level; software-based defect detection; Debugging; Energy consumption; Hardware; Manufacturing processes; Microprocessors; Microprogramming; Software performance; Software testing; Sun; System testing; Reliability; hardware defects; manufacturing test.; online defect detection; online self-test; post-silicon debugging; testing;
Journal_Title :
Computers, IEEE Transactions on