DocumentCode :
2053958
Title :
TDP-Shell: A Generic Framework to Improve Interoperability between Batch Queue Systems and Monitoring Tools
Author :
Ivars, Vicente J. ; Senar, Miquel A. ; Heymann, Elisa
Author_Institution :
Comput. Archit. & Oper. Syst. Dept., Univ. Autonoma de Barcelona, Barcelona, Spain
fYear :
2011
fDate :
26-30 Sept. 2011
Firstpage :
522
Lastpage :
526
Abstract :
Nowadays distributed applications, including MPI implementations, are executed on computer clusters managed by a batch queue system. Users take advantage of monitoring tools to detect run-time problems on their applications running on those environments. But it is a challenge to use monitoring tools on a cluster controlled by a batch queue system. This is due to the fact that batch queue systems and monitoring tools do not coordinate the management of the resources they share, when executing a distributed application. We name this problem lack of interoperability and to solve it we have developed a framework called TDP-Shell. This framework supports different batch queue systems such as Condor and SGE, and different monitoring tools such as Paradyn, Gdb and Total view, without any changes on their source code. In this paper we describe how our basic design of TDP-Shell for sequential applications was re-designed to support the monitoring of MPI applications that are executed on a cluster controlled by a batch queue system.
Keywords :
message passing; open systems; queueing theory; resource allocation; system monitoring; Condor; Gdb; MPI application monitoring; Paradyn; SGE; TDP-Shell; Total view; batch queue systems; computer clusters; distributed application; interoperability; monitoring tools; resource management; resource sharing; run-time problem detection; sequential applications; Computer architecture; Libraries; Manuals; Monitoring; Portable document format; Protocols; Resource management; batch queue systems; interoperability; monitoring tools;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2011 IEEE International Conference on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4577-1355-2
Electronic_ISBN :
978-0-7695-4516-5
Type :
conf
DOI :
10.1109/CLUSTER.2011.73
Filename :
6061200
Link To Document :
بازگشت