DocumentCode
2720456
Title
Distributed multicomputer system availability based on measurements: A case study
Author
Hsueh, Mei-Chen
Author_Institution
Digital Equipiment Corp., Maynard, MA, USA
fYear
1991
fDate
27-30 Mar 1991
Firstpage
78
Lastpage
84
Abstract
The author presents an experimental approach to evaluating the availability of distributed multicomputer systems. The measurement of a distributed system was conducted in an operational environment. To understand system failure behavior, all host computer restarts and their causes were collected. There was no centralized automatic logging mechanism. Data were collected from each individual computer. The method proposed to identify multiple-failure events from ERRLOG data of 14 VAX hosts is based on the moving window technique and possibility reasoning. The proposed rules, although very simple and focusing only on high-level reasoning, demonstrate a framework of using possibility reasoning for decision making. This study was conducted on a large scale VAXcluster system. Results showed that about 55% of restarts were due to dependent failures and most of them were scheduled orderly shutdowns. System availability was then estimated from a performance aspect
Keywords
distributed processing; multiprocessing systems; performance evaluation; ERRLOG data; VAX hosts; decision making; distributed multicomputer system availability; possibility reasoning; system failure behavior; window technique; Availability; Computer aided software engineering; Control systems; Distributed computing; Hardware; Operating systems; System performance; Time measurement; Topology; Voice mail;
fLanguage
English
Publisher
ieee
Conference_Titel
Computers and Communications, 1991. Conference Proceedings., Tenth Annual International Phoenix Conference on
Conference_Location
Scottsdale, AZ
Print_ISBN
0-8186-2133-8
Type
conf
DOI
10.1109/PCCC.1991.113795
Filename
113795
Link To Document