DocumentCode :
1700439
Title :
Chameleon: adaptive fault tolerance using reliable, mobile agents
Author :
Iyer, R.K. ; Kalbarczyk, Z. ; Bagchi, S.
Author_Institution :
Center for Reliable & High Performance Comput., Illinois Univ., Urbana, IL, USA
fYear :
1997
Firstpage :
61
Lastpage :
62
Abstract :
In networked computing systems, a broad range of commercial and scientific applications that need varying degrees of availability must coexist. It is not cost-effective to develop a reliable platform in each case. It is more efficient to build an infrastructure that provides the required level of dependability for each application´s needs. It is also essential that the proposed alternatives should leverage off-the-shelf components. There have been exhaustive studies on fault tolerance strategies capable of providing efficient mechanisms to deal with system operational failures. Most of this work has focused on specific application needs and thus provided only piecemeal solutions. Little work has been done in addressing how to build a reliable networked computing system out of unreliable computation nodes. As a result, there is no comprehensive solution for providing a wide range of fault-tolerant services in a single networked environment. The most feasible way of understanding how such a software environment would fit on top of existing layers (the operating system, the network interfaces, etc.) is to implement an infrastructure for providing a range of reliable services. Fundamental components of the envisioned infrastructure (Chameleon) have been designed so that none of them is a single point of failure. Each of the components is active for a certain period, e.g. during the setting up the system configuration. If a component fails during its active phase, there is a provision for recovery, either by switching to a backup or by regenerating the component
Keywords :
adaptive systems; computer network reliability; fault tolerant computing; software agents; system recovery; Chameleon; adaptive fault tolerance; application needs; availability; backup; commercial applications; component active period; component failure; component regeneration; cost-effectiveness; dependability; fault-tolerant services; off-the-shelf components; reliable mobile agents; reliable networked computing systems; scientific applications; software environment; system configuration setup; system operational failures; system recovery; unreliable computation nodes; Application software; Availability; Computer networks; Environmental management; Fault diagnosis; Fault tolerance; Intelligent agent; Mobile agents; Remote monitoring; Software libraries;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Reliable Distributed Systems, 1997. Proceedings., The Sixteenth Symposium on
Conference_Location :
Durham, NC
ISSN :
1060-9857
Print_ISBN :
0-8186-8177-2
Type :
conf
DOI :
10.1109/RELDIS.1997.632798
Filename :
632798
Link To Document :
بازگشت