Title :
Analyze-NOW-an environment for collection and analysis of failures in a network of workstations
Author :
Thakur, Anslhuman ; Iyer, Ravishankar K.
Author_Institution :
Center for Reliable & High Performance Comput., Illinois Univ., Urbana, IL, USA
fDate :
12/1/1996 12:00:00 AM
Abstract :
This paper describes Analyze-NOW, an environment for the collection and analysis of failures/errors in a network of workstations. Descriptions cover the data collection methodology and the tool implemented to facilitate this process. Software tools used for analysis are described, with emphasis on the details of the implementation of the Analyzer, the primary analysis tool. Application of the tools is demonstrated by using them to collect and analyze failure data (for 32-week period) from a network of 69 SunOS-based workstations. Classification based on the source and effect of faults is used to identify problem areas. Different types of failures encountered on the machines and network are highlighted to develop a proper understanding of failures in a network environment. The results from the analysis tool should be used to pinpoint the problem areas in the network. The results obtained from using Analyze-NOW on failure data from the monitored network reveal some interesting behavior of the network. Nearly 70% of the failures were network-related, whereas disk errors were few. Network-related failures were 75% of all hard-failures (failures that make a workstation unusable). Half of the network-related failures were due to servers not responding to clients, and half were performance-related and others. Problem areas in the network were found using this tool. The authors´ approach was compared to the method of using the network architecture to locate problem areas. This comparison showed that locating problem areas using network architecture over-estimates the number of problem areas
Keywords :
computer network reliability; computerised monitoring; failure analysis; software tools; workstations; Analyze-NOW environment; SunOS-based workstations; failure analysis; failure data; fault classification; network architecture; software tools; workstation network reliability; Application software; Condition monitoring; Data analysis; Failure analysis; Fault diagnosis; Intelligent networks; Local area networks; Network servers; Software tools; Workstations;
Journal_Title :
Reliability, IEEE Transactions on