Title :
A Generic Execution Management Framework for Scientific Applications
Author :
Elahi, Tanvire ; Kiddle, Cameron ; Simmonds, Rob
Author_Institution :
Dept. of Comput. Sci., Univ. of Calgary, Calgary, AB, Canada
Abstract :
Managing the execution of scientific applications in a heterogeneous grid computing environment can be a daunting task, particularly for long running jobs. Increasing fault tolerance by checkpointing and migrating jobs between resources requires expertise and time of the scientist. Automation of such tasks can allow the scientist to focus more on the scientific results and less on the technical details. In this paper a generic framework for managing and automating the execution of jobs is presented. It uses of a variety of information models describing systems, policies, and application details/requirements to make suitable decisions on where and how to run, checkpoint, migrate and reconfigure jobs as needed. To demonstrate the utility of the framework, it is used as part of a simulation study to assess the impact availability of application memory usage information has on meeting the QoS objectives of job submitters and on overall utilization of resources. The study shows that with greater availability of memory usage information, the execution management framework is able to better meet user objectives and improve utilization of resources, particularly when the objective is to make more efficient use of resources.
Keywords :
fault tolerance; grid computing; resource allocation; QoS objective; fault tolerance; generic execution management framework; heterogeneous grid computing; memory usage information; Application Modelling; Automation; Execution Management; Grid Computing; Simulation;
Conference_Titel :
High Performance Computing and Communications (HPCC), 2010 12th IEEE International Conference on
Conference_Location :
Melbourne, VIC
Print_ISBN :
978-1-4244-8335-8
Electronic_ISBN :
978-0-7695-4214-0
DOI :
10.1109/HPCC.2010.117