Title :
Prototype of fault adaptive embedded software for large-scale real-time systems
Author :
Messie, Derek ; Jung, Mina ; Oh, Jae C. ; Shetty, Shweta ; Nordstrom, Steven ; Haney, Michael
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Syracuse Univ., NY, USA
Abstract :
This paper describes a comprehensive prototype of large-scale fault adaptive embedded software developed for the proposed Fermilab BTeV high energy physics experiment. Lightweight self-optimizing agents embedded within Level 1 of the prototype are responsible for proactive and reactive monitoring and mitigation based on specified layers of competence. The agents are self-protecting, detecting cascading failures using a distributed approach. Adaptive, reconfigurable, and mobile objects for reliablility are designed to be self-configuring to adapt automatically to dynamically changing environments. These objects provide a self-healing layer with the ability to discover, diagnose, and react to discontinuities in real-time processing. A generic modeling environment was developed to facilitate design and implementation of hardware resource specifications, application data flow, and failure mitigation strategies. Level 1 of the planned BTeV trigger system alone will consist of 2500 DSPs, so the number of components and intractable fault scenarios involved make it impossible to design an ´expert system´ that applies traditional centralized mitigative strategies based on rules capturing every possible system state. Instead, a distributed reactive approach is implemented using the tools and methodologies developed by the Real-Time Embedded Systems group.
Keywords :
data acquisition; embedded systems; formal specification; high energy physics instrumentation computing; particle accelerators; software fault tolerance; software prototyping; BTeV trigger system; Fermilab BTeV high energy physics experiment; Real-Time Embedded Systems group; fault adaptive embedded software prototype; large-scale real-time systems; lightweight self-optimizing agents; proactive monitoring; reactive monitoring; Condition monitoring; Digital signal processing; Embedded software; Hardware; Large-scale systems; Power system faults; Power system protection; Prototypes; Real time systems; Software prototyping;
Conference_Titel :
Engineering of Computer-Based Systems, 2005. ECBS '05. 12th IEEE International Conference and Workshops on the
Print_ISBN :
0-7695-2308-0
DOI :
10.1109/ECBS.2005.56