Title :
Task scheduling for reconfigurable systems in dynamic fault-rate environments
Author :
Jacobs, A. ; Wulf, Nicholas ; George, Alan D.
Author_Institution :
NSF Center for High-Performance Reconfigurable Comput. (CHREC), Univ. of Florida, Gainesville, FL, USA
Abstract :
Commercial SRAM-based, field-programmable gate arrays (FPGAs) have the capability to provide space applications with the necessary performance, energy-efficiency, and adaptability to meet next-generation mission requirements. However, mitigating an FPGA´s susceptibility to radiation-induced faults is challenging. Triple-modular redundancy (TMR) techniques are traditionally used to mitigate radiation effects, but TMR incurs substantial overheads such as increased area and power requirements. Using partial reconfiguration (PR), FPGAs could be used to dynamically adjust the fault-tolerance scheme as the radiation environment changes over time. In order to manage these dynamic adjustments, a fault-tolerant task scheduler is necessary. We improve scheduling in the presence of time-varying fault rates by developing a fault-tolerant scheduling heuristic. Our heuristic combines task execution time and system fault rate to determine the optimal fault-tolerance mode for the task. The heuristic is evaluated using software simulations of a system in periodic and burst fault environments. Results show our scheduling technique is capable of reducing the task rejection ratio in periodic environments by 94% and in burst environments by 48% over static TMR, and the adaptive heuristic approaches the performance of an optimal predetermined heuristic. Integration of our fault-tolerant scheduling heuristic with other preexisting PR architectures can enable their use in dynamic fault environments.
Keywords :
SRAM chips; fault tolerance; field programmable gate arrays; integrated circuit reliability; radiation hardening (electronics); redundancy; scheduling; FPGAs; PR architectures; SRAM-based field-programmable gate arrays; TMR techniques; adaptive heuristic approach; dynamic fault-rate environments; energy-efficiency; fault environments; fault-tolerant scheduling heuristic; fault-tolerant task scheduler scheme; next-generation mission requirements; optimal predetermined heuristic; partial reconfiguration; radiation effect mitigation; radiation environment; radiation-induced faults; reconfigurable systems; software simulations; system fault rate; task execution time; time-varying fault rates; triple-modular redundancy techniques; Fault tolerance; Fault tolerant systems; Field programmable gate arrays; Schedules; Space vehicles; Tunneling magnetoresistance;
Conference_Titel :
High Performance Extreme Computing Conference (HPEC), 2013 IEEE
Conference_Location :
Waltham, MA
Print_ISBN :
978-1-4799-1364-0
DOI :
10.1109/HPEC.2013.6670334