• DocumentCode
    2693285
  • Title

    Identifying symptoms of recurrent faults in log files of distributed information systems

  • Author

    Reidemeister, Thomas ; Munawar, Mohammad A. ; Ward, Paul A S

  • Author_Institution
    E&CE Dept., Univ. of Waterloo, Waterloo, ON, Canada
  • fYear
    2010
  • fDate
    19-23 April 2010
  • Firstpage
    187
  • Lastpage
    194
  • Abstract
    The manual process to identifying causes of failure in distributed information systems is difficult and time-consuming. The underlying reason is the large size and complexity of these systems, and the vast amount of monitoring data they generate. Despite its high cost, this manual process is necessary in order to avoid the detrimental consequences of system downtime. Several studies and operator practice suggest that a large fraction of the failures in these systems are caused by recurrent faults. Therefore, significant efficiency gains can be achieved by automating the identification of these faults. In this work we present methods, which draw from the areas of information retrieval as well as machine learning, to automate the task of infering symptoms pertinent to failures caused by specific faults. In particular, we present a method to infer message types from plain-text log messages, and we leverage these types to train classifiers and extract rules to identify symptoms of recurrent faults automatically.
  • Keywords
    computer network management; fault tolerant computing; information retrieval; learning (artificial intelligence); distributed information systems; efficiency gain; fault diagnosis; information retrieval; log files; machine learning; plain-text log messages; recurrent faults; system downtime; Condition monitoring; Costs; Data mining; Distributed information systems; Fault diagnosis; Humans; Information retrieval; Machine learning; Management information systems; Performance analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network Operations and Management Symposium (NOMS), 2010 IEEE
  • Conference_Location
    Osaka
  • ISSN
    1542-1201
  • Print_ISBN
    978-1-4244-5366-5
  • Electronic_ISBN
    1542-1201
  • Type

    conf

  • DOI
    10.1109/NOMS.2010.5488459
  • Filename
    5488459