Title :
Finding associations in Grid monitoring data
Author :
Maier, Gerhild ; van der Ster, Daniel ; Kranzlmüller, Dieter
Author_Institution :
CERN, Geneva, Switzerland
Abstract :
Error handling is a crucial task in infrastructures as complex as grids. Today, there are several monitoring tools which can be used to report failing grid jobs including corresponding error codes. However, the error codes do not always indicate the actual fault which originally caused the job failure. Human time and expertise is required to manually trace errors back to the real fault underlying an error. We perform Association Rule Mining on grid job monitoring data to automatically retrieve knowledge about the grid components´ behaviour by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information - expressed by association rules - is visualised in a web interface. This work achieves a decrease in time for fault recovery and consequently yields an improvement of a grid´s reliability.
Keywords :
Internet; data mining; error handling; grid computing; information retrieval; system recovery; Web interface; association rule mining; error code; error handling; fault recovery; grid job monitoring data; knowledge retrieval; monitoring tool; problematic grid component; Association rules; Computerized monitoring; Condition monitoring; Data mining; Grid computing; Humans; Information retrieval; Large Hadron Collider; Mesh generation; Visualization;
Conference_Titel :
Grid Computing, 2009 10th IEEE/ACM International Conference on
Conference_Location :
Banff, AB
Print_ISBN :
978-1-4244-5148-7
Electronic_ISBN :
978-1-4244-5149-4
DOI :
10.1109/GRID.2009.5353076