DocumentCode
1863747
Title
Log summarization and anomaly detection for troubleshooting distributed systems
Author
Gunte, Dan ; Tierney, Brian L. ; Brown, Aaron ; Swany, Martin ; Bresnahan, John ; Schopf, Jennifer M.
Author_Institution
Lawrence Berkeley Nat. Lab., Berkeley
fYear
2007
fDate
19-21 Sept. 2007
Firstpage
226
Lastpage
234
Abstract
Today´s system monitoring tools are capable of detecting system failures such as host failures, OS errors, and network partitions in near-real time. Unfortunately, the same cannot yet be said of the end-to-end distributed software stack. Any given action, for example, reliably transferring a directory of files, can involve a wide range of complex and interrelated actions across multiple pieces of software: checking user certificates and permissions, getting details for all files, performing third-party transfers, understanding re-try policy decisions, etc. We present an infrastructure for troubleshooting complex middleware, a general purpose technique for configurable log summarization, and an anomaly detection technique that works in near-real time on running Grid middleware. We present results gathered using this infrastructure from instrumented Grid middleware and applications running on the Emulab testbed. From these results, we analyze the effectiveness of several algorithms at accurately detecting a variety of performance anomalies.
Keywords
grid computing; middleware; security of data; system monitoring; system recovery; Grid middleware; anomaly detection; checking user certificates; end-to-end distributed software stack; log summarization; system failures; system monitoring tools; troubleshooting distributed systems; Condition monitoring; Debugging; Degradation; Instruments; Laboratories; Middleware; Permission; Software performance; Software tools; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Grid Computing, 2007 8th IEEE/ACM International Conference on
Conference_Location
Austin, Texas
Print_ISBN
978-1-4244-1560-1
Electronic_ISBN
978-1-4244-1560-1
Type
conf
DOI
10.1109/GRID.2007.4354137
Filename
4354137
Link To Document