DocumentCode :
1816456
Title :
Mining Logs Files for Computing System Management
Author :
Peng, Wei ; Li, Tao ; Ma, Sheng
Author_Institution :
Sch. of Comput. Sci., Florida Int. Univ., Miami, FL
fYear :
2005
fDate :
13-16 June 2005
Firstpage :
309
Lastpage :
310
Abstract :
With advancement in science and technology, computing systems become increasingly more difficult to monitor, manage and maintain. Traditional approaches to system management have been largely based on domain experts through a knowledge acquisition process to translate domain knowledge into operating rules and policies. This has been experienced as a cumbersome, labor intensive, and error prone process. There is thus a pressing need for automatic and efficient approaches to monitor and manage complex computing systems. A popular approach to system management is based on analyzing system log files. However, several new aspects of the system log data have been less emphasized in existing analysis methods and posed several challenges. The aspects include disparate formats and relatively short text messages in data reporting, asynchronous data collection, and temporal characteristics in data representation. First, a typical computing system contains different devices with different software components, possibly from different providers. These various components have multiple ways to report events, conditions, errors and alerts. The heterogeneity and inconsistency of log formats make it difficult to automate problem determination. To perform automated analysis, we need to categorize the text messages with disparate formats into common situations. Second, text messages in the log files are relatively short with a large vocabulary size. Third, each text message usually contains a timestamp. The temporal characteristics provide additional context information of the messages and can be used to facilitate data analysis. In this paper, we apply text mining to automatically categorize the messages into a set of common categories, and propose two approaches of incorporating temporal information to improve the categorization performance
Keywords :
Bayes methods; data mining; data structures; message passing; object-oriented programming; asynchronous data collection; computing system management; context information; data analysis; data reporting; data representation; domain knowledge; knowledge acquisition; log file mining; log format heterogeneity; log format inconsistency; operating policies; operating rules; software components; system log file analysis; system maintenance; system monitoring; temporal characteristics; text message categorization; text mining; Computer science; Computerized monitoring; Data analysis; Knowledge acquisition; Knowledge management; Machine learning; Performance analysis; Pressing; Technology management; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Autonomic Computing, 2005. ICAC 2005. Proceedings. Second International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7965-2276-9
Type :
conf
DOI :
10.1109/ICAC.2005.40
Filename :
1498077
Link To Document :
بازگشت