Author_Institution :
Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
Producing reliable, integrated systems is becoming extremely difficult due to the increasing variability and uncertainty inherent in advancing fabrication technologies; the worsening effects of various wear-out mechanisms; and environmental disturbances (e.g., soft errors due to radiation). Solutions to this problem centering on traditional approaches involving redundant resources are likely to be impractical due to the negative impact on power and performance. Because of this trend, it is widely believed that approaches for enabling chips to self-X (where X = monitor, diagnose, calibrate, compensate, heal, etc.) will be needed to ensure reliable operation over a chip´s expected lifetime. In collaboration with Prof. Mitra of Stanford University, our work in this area is focused on developing methodologies that enable integrated systems to self-monitor, self-diagnose, and self-compensate for various non-idealities. In this talk, I will describe how state-of-the-art diagnosis of failing ICs today is used today to extract valuable information about design, manufacturing and test itself. Although tremendously beneficial, a fault simulator, significant amounts of design data, and powerful computer servers are needed to perform diagnosis. It is therefore infeasible to implement traditional effect-cause diagnosis within a system. We are therefore developing efficient cause-effect (i.e., fault dictionary) based approaches for performing in-system diagnosis. I will conclude the talk by describing approaches for developing efficient fault dictionaries for on-chip implementation.
Keywords :
fault diagnosis; integrated circuit manufacture; integrated circuit reliability; integrated circuit testing; environmental disturbance; fabrication technology; fault dictionary; fault simulator; integrated circuit self-compensation; integrated circuit self-diagnosing; integrated circuit self-monitoring; integrated systems; on-chip implementation; reliable operation; robust systems; wear out mechanisms; Biomedical monitoring; Computers; Data mining; Dictionaries; Integrated circuit modeling; Servers; System-on-a-chip;