Title :
Enhanced FPGA reliability through efficient run-time fault reconfiguration
Author :
Lach, John ; Mangione-Smith, William H. ; Potkonjak, Miodrag
Author_Institution :
Dept. of Electr. Eng., Virginia Univ., Charlottesville, VA, USA
fDate :
9/1/2000 12:00:00 AM
Abstract :
The expanded use of field programmable gate arrays (FPGA) in remote, long life, and system-critical applications requires the development and implementation of effective, efficient FPGA fault-tolerance techniques. FPGA have inherent redundancy and in-the-field reconfiguration capabilities, thus providing alternatives to standard integrated circuit redundancy-based fault-recovery techniques. Runtime reliability can be enhanced by using such unique features. Recovery from permanent logic and interconnect faults without runtime computer-aided design (CAD) support can be efficiently performed with the use of fine-grained and physical design partitioning. Faults are localized to small partitioned blocks that have fixed interfaces to the surrounding portions of the design, and the affected blocks are reconfigured with previously generated, functionally equivalent block instances that do not use the faulty resources. This technique minimizes the post-fault-detection system downtime, while requiring little area overhead. Only the finely located faulty portions of the FPGA are removed from use. In addition, the end user need not have access to CAD tools, making the algorithm completely transparent to system users. This approach has been efficiently implemented on a diverse set of FPGA architectures. The algorithm´s flexibility is also apparent from the variable emphases that can be placed on system reliability, area overhead, timing overhead, design effort, and system memory. Given user-defined emphases, the algorithm can be modified to specific application requirements. Experiments using random s-independent and s-correlated fault models reveal that the approach enhances system reliability, while minimizing area and timing overhead
Keywords :
circuit reliability; fault diagnosis; field programmable gate arrays; FPGA reliability enhancement; area overhead; efficient run-time fault reconfiguration; fault-recovery techniques; field programmable gate arrays; fine-grained partitioning; in-the-field reconfiguration capabilities; inherent redundancy; interconnect faults; long life applications; permanent logic faults; physical design partitioning; random s-correlated fault models; random s-independent fault models; remote applications; runtime reliability; small partitioned blocks; system memory; system reliability; system-critical applications; timing overhead; Application software; Circuit faults; Design automation; Fault tolerant systems; Field programmable gate arrays; Integrated circuit reliability; Logic design; Redundancy; Runtime; Timing;
Journal_Title :
Reliability, IEEE Transactions on