• DocumentCode
    2263451
  • Title

    A Generic Execution Management Framework for Scientific Applications

  • Author

    Elahi, Tanvire ; Kiddle, Cameron ; Simmonds, Rob

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Calgary, Calgary, AB, Canada
  • fYear
    2010
  • fDate
    1-3 Sept. 2010
  • Firstpage
    544
  • Lastpage
    551
  • Abstract
    Managing the execution of scientific applications in a heterogeneous grid computing environment can be a daunting task, particularly for long running jobs. Increasing fault tolerance by checkpointing and migrating jobs between resources requires expertise and time of the scientist. Automation of such tasks can allow the scientist to focus more on the scientific results and less on the technical details. In this paper a generic framework for managing and automating the execution of jobs is presented. It uses of a variety of information models describing systems, policies, and application details/requirements to make suitable decisions on where and how to run, checkpoint, migrate and reconfigure jobs as needed. To demonstrate the utility of the framework, it is used as part of a simulation study to assess the impact availability of application memory usage information has on meeting the QoS objectives of job submitters and on overall utilization of resources. The study shows that with greater availability of memory usage information, the execution management framework is able to better meet user objectives and improve utilization of resources, particularly when the objective is to make more efficient use of resources.
  • Keywords
    fault tolerance; grid computing; resource allocation; QoS objective; fault tolerance; generic execution management framework; heterogeneous grid computing; memory usage information; Application Modelling; Automation; Execution Management; Grid Computing; Simulation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications (HPCC), 2010 12th IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • Print_ISBN
    978-1-4244-8335-8
  • Electronic_ISBN
    978-0-7695-4214-0
  • Type

    conf

  • DOI
    10.1109/HPCC.2010.117
  • Filename
    5581486