DocumentCode :
2265205
Title :
iTrack: Correlating user activity with system data
Author :
Mann, Vijay ; Vishnoi, Anilkumar
Author_Institution :
IBM Res., New Delhi, India
fYear :
2012
fDate :
16-20 April 2012
Firstpage :
1068
Lastpage :
1074
Abstract :
Human error has been identified one of the major factors behind system outages and network downtime in a number of previous research papers and surveys. Gartner statistics show that almost 40% of unplanned application downtime is caused due to operator errors such as unintentional changes to network configuration resulting in a network outage, patch installations, service restart, etc. Yet, system admin activities on production IT systems are rarely properly logged and monitored. Existing tools to track user activities either produce too much information without any hints of a potential outage scenario or too little information to be useful in a meaningful way. In this paper, we describe the design and implementation of iTrack - a framework for monitoring user activities and correlating them with system data. iTrack makes use of commonly available native monitoring and diagnostic utilities on operating systems to monitor systems events as well as system admin activity, correlates these two sets of information and categorizes the activity as potentially abnormal or harmful based on its impact on the system in terms of file system, network and process activities. We demonstrate the usefulness of iTrack through several use cases and real world examples such as detecting and diagnosing system outages in real time, conducting post mortem analysis of outages, and maintaining audit logs. Our experimental evaluation of iTrack confirms that its monitoring overhead in terms of CPU time, activity completion time and data generated is within the tolerance range of most production systems. In cases, where the overhead was found to be unacceptable, we detect the underlying cause and provide solutions. These solutions improve performance by up to 20% to 90%, in terms of managed server and iTrack server CPU utilization, respectively and by up to 2 times in terms of completion time of certain system admin activities on the managed server.
Keywords :
business data processing; computerised monitoring; management of change; multiprocessing systems; service-oriented architecture; CPU time; Gartner statistics; activity completion time; audit logs; data generation; file system; human error; iTrack; iTrack server CPU utilization; monitor systems; network configuration; network downtime; network outage; operator errors; patch installations; production IT systems; service restart; system admin activities; system data; user activities monitoring; user activity correlation; Data visualization; Engines; Monitoring; Operating systems; Production; Real time systems; Servers; change management; outage detection; problem determination; user activity monitoring;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network Operations and Management Symposium (NOMS), 2012 IEEE
Conference_Location :
Maui, HI
ISSN :
1542-1201
Print_ISBN :
978-1-4673-0267-8
Electronic_ISBN :
1542-1201
Type :
conf
DOI :
10.1109/NOMS.2012.6212031
Filename :
6212031
Link To Document :
بازگشت