Title :
An Architecture to Enable Life Cycle Testing in CMPs
Author :
Rodrigues, Rance ; Koren, Israel ; Kundu, Sandip
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Massachusetts at Amherst, Amherst, MA, USA
Abstract :
CMOS wear-out mechanisms such as time dependent breakdown of gate dielectrics (TDDB), hot carrier injection (HCI), negative bias temperature instability (NBTI), electro migration (EM), and stress induced voiding (SIV) are well documented in the literature. Often the onset of wear-out is gradual, with initial manifestation as delay defects that result in timing errors. This motivates the need for online testing. The combined effect of dynamic reconfiguration such as voltage and frequency scaling (DVFS) and signal integrity issues coupled with aging related wear-outs complicate a priori selection of test vectors, further favoring online testing. Traditional online test techniques such as Double and Triple Modular Redundancy (DMR and TMR) pose severe area and power overheads. In this paper we propose an architecture to assist online testing in a Chip Multiprocessor (CMP) based on execution path recording. Since in practice, core utilization in CMPs is low, we can use the idle time of cores opportunistically to run test threads that mimic functional threads. The initiation, termination and comparison of test results is performed by a dedicated, simple and functionally limited small core that we call the Sentry Core (SC). The sentry core is hidden from the OS and has the ability to monitor and interrupt the general-purpose cores. Upon interrupt, the general-purpose core can send data to the sentry core. To detect errors, the SC initializes the general-purpose cores and collects signatures from hardware monitors (that compact execution traces) and compares them against duplicate test threads, obviating any need for cycle by cycle comparison. Major benefits of the proposed solution include: (1) online testing with minimal area overhead, (2) scalability, and (3) testability throughout the life cycle of a CMP. Experimental results show that the proposed scheme is capable of detecting 87% of the faults injected into the processor at an area overhead of less than 3% of the ta- get CMP.
Keywords :
electromigration; error detection; hot carriers; microprocessor chips; power aware computing; wear; CMOS wear-out mechanism; CMP; architecture; chip multiprocessor; double modular redundancy; dynamic reconfiguration; electromigration; error detection; execution path recording; frequency scaling; gate dielectrics; general-purpose cores; hot carrier injection; life cycle testing; negative bias temperature instability; online testing; sentry core; signal integrity issue; stress induced voiding; time dependent breakdown; triple modular redundancy; voltage scaling; Benchmark testing; Degradation; Fault detection; Hardware; Multicore processing; Switches; low-cost test; microprocessor test; online fault detection; opportunistic test;
Conference_Titel :
Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2011 IEEE International Symposium on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4577-1713-0
DOI :
10.1109/DFT.2011.26