Abstract :
In the space domain, complex hardware/software systems have to fulfill strong requirements during operation. Besides functionality, high quality of software subsystems w.r.t. extra-functional properties is an important issue. Especially for long-period missions, the assurance of reliability is critical for mission success. In case of system defects, remote maintenance is only feasible to a limited extent. To support the remote maintenance, techniques for detection and localization of faults (e.g., FDIR) are state of the art in today´s space software systems. Usually, such software comprises emergency operation modes to guarantee at least communication with ground control. This results from the fact that the fault recovery is primarily done in a conservative manner by experts on the ground. This may lead to increased mission delays and, e.g., loss of valuable measurement data. In the worst case, a lengthy interruption or insufficient capabilities of remote maintenance harm scientific instruments. Hence, concepts for autonomous fault handling are necessary to increase reliability of such expensively maintainable space systems. In this paper, we propose an approach for handling hardware faults by self-reconfiguration on the level of functional architecture at run time. We use a formal structure to describe multiple configurations of a system continuously adapting to changing environmental circumstances. A system configuration includes the structure of functional components (as part of an architecture) and their relations to corresponding hardware resources. Architectures may differ considerably due to the usage of different subsets of hardware resources. We distinguish between configurations on a higher level of abstraction by means of qualitative ratings for architectures. The rating relies upon an integration of quality attributes. This leads to a timed automaton like graph structure for reconfiguration decisions. Nodes represent quality-rated configurations and ed- es relate configurations among one another. A transition from one node to another is triggered when a hardware fault occurs. The computation of the ratings for configurations is done at design time to reduce evaluation efforts during operation. Thus, if a hardware resource of the system is no longer available at run time, a suitable alternative configuration is determined at low expenses. The process of triggering, determining, and deploying new reconfigurations is implemented as a control loop. The triggering is supported by the fault localization mechanism of a given FDIR approach. Based on our theoretical framework, we present a concept for prototypical tool support for evaluating and relating architectures. For the evaluation of our approach, we use a subsystem of a realworld space craft. We examine the attitude and orbit control subsystem of the TET-1 micro satellite bus. Our approach is applied to an extended version of the system with multiple redundancies in actuation. In this way, the capability to extend the autonomy of the satellite is analyzed. Additionally, an overview of the conventional FDIR mechanisms of the satellite is given.
Keywords :
aerospace instrumentation; artificial satellites; fault location; reliability; TET-1 microsatellite bus; attitude subsystem; autonomous fault handling; control loop; fault detection; fault localization; fault localization mechanism; fault recovery; formal structure; functional architectural level; hardware faults; orbit control subsystem; quality-rated configurations; reliability; self-reconfiguration; space systems; system configuration; Computer architecture; Fault tolerant systems; Hardware; Redundancy; Satellites; Software;