Title :
A job monitoring system for the LCG computing grid
Author :
Hammad, Ahmad ; Harenberg, Torsten ; Igdalov, Dimitri ; Mättig, Peter ; Meder-Marouelli, David ; Ueberholz, Peer
Author_Institution :
Dept. of Comput. Sci., Niederrhein Univ. of Appl. Sci., Krefeld
Abstract :
Experience with generating simulation data of high energy physics experiments has shown that a job monitoring system (JMS) is essential to understand failures of jobs within the grid. Such a system can give information about the status of the user job as well as the worker node in parallel while a user job is running. It should support the user directly by allowing the user to interact with the running job and should be able to make an automatic error correction. Furthermore, such a system can be extended for an automatic classification of errors which can improve the stability and performance of the grid environment. To increase the acceptance of the grid, a graphical user interface (GUI) has been developed and integrated with the job monitoring system. Both components are currently integrated in the computing environment for generating data for the D0 experiment. In this paper we want to describe the basic components of the job monitoring software
Keywords :
graphical user interfaces; grid computing; system monitoring; LCG computing grid; automatic error classification; automatic error correction; grid environment; high energy physics experiments; job monitoring system; large hardon collider; simulation data; Computational modeling; Computer science; Computerized monitoring; Condition monitoring; Graphical user interfaces; Grid computing; Large Hadron Collider; Mesh generation; Physics; User interfaces;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
Conference_Location :
Rhodes Island
Print_ISBN :
1-4244-0054-6
DOI :
10.1109/IPDPS.2006.1639659