DocumentCode
3474922
Title
A method for the construction and interpretation of high level models for distributed fault-tolerant systems
Author
Tilly, K. ; Kiss, I. ; Roman, Graciela ; Dobrowiecki, T. ; Várkonyi-Kóczy, A.R.
Author_Institution
Dept. of Meas. & Instrum. Eng., Budapest Tech. Univ., Hungary
fYear
1995
fDate
13-15 Sep 1995
Firstpage
72
Lastpage
81
Abstract
Traditional solutions for achieving fault-tolerance are intended for use at design time and they generally capture system information at a very low (hardware or machine instruction) level. Increasing reliability of complex information systems containing many (perhaps many thousands) of autonomous components requires different solutions. This article presents a new methodology for the implementation of large scale, distributed fault-tolerant systems. System models are formed of objects describing requirements, services and resources organized into high level top-down hierarchical decomposition structures. Since redundancy is a natural property of any large scale system, by using such models it is possible to achieve fault tolerant behaviour by finding multiple appropriate mappings between requirements and available services, and to support the required services by available resources. The distributed system is extended with dedicated components, called diagnostic centres, which manage distinct parts of the system model, continuously observe the operation of the distributed system, and find alternative requirement-service mappings, if some services fail to fulfil their associated requirements. The elements and the structure of the proposed system modelling method are presented, an appropriate fault model is defined, and the algorithms for model interpretation are described
Keywords
distributed processing; fault tolerant computing; autonomous components; complex information systems; design time; diagnostic centres; distributed fault-tolerant systems; high level models; high level top-down hierarchical decomposition structures; system information; system modelling method; Computer architecture; Distributed computing; Fault tolerance; Fault tolerant systems; Hardware; Information systems; Instruments; Large-scale systems; Redundancy; Reliability engineering;
fLanguage
English
Publisher
ieee
Conference_Titel
Reliable Distributed Systems, 1995. Proceedings., 14th Symposium on
Conference_Location
Bad Neuenahr
ISSN
1060-9857
Print_ISBN
0-8186-7153-X
Type
conf
DOI
10.1109/RELDIS.1995.526215
Filename
526215
Link To Document