• DocumentCode
    1496346
  • Title

    Predictable High-Performance Computing Using Feedback Control and Admission Control

  • Author

    Park, Sang-Min ; Humphrey, Marty A.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Virginia, Charlottesville, VA, USA
  • Volume
    22
  • Issue
    3
  • fYear
    2011
  • fDate
    3/1/2011 12:00:00 AM
  • Firstpage
    396
  • Lastpage
    411
  • Abstract
    Historically, batch scheduling has dominated the management of High-Performance Computing (HPC) resources. One of the most significant limitations using this approach is an inability to predict both the start time and end time of jobs. Although existing researches such as resource reservation and queue-time prediction partially address this issue, a more predictable HPC system is needed, particularly for an emerging class of adaptive real-time HPC applications. This paper presents a design and implementation of a predictable HPC system using feedback control and admission control. By creating a virtualized application layer and opportunistically multiplexing concurrent applications through the application of formal control theory, we regulate a job´s progress such that the job meets its deadline without requiring exclusive access to resources even in the presence of a wide class of unexpected events. Admission control regulates access to resources when oversubscribed. Our experimental results using five widely used applications show that the feedback and admission controller achieves highly predictable HPC system. The designed feedback controller regulates the HPC job´s progress accurately, close to the prediction by theory, thereby, showing the successful application of classic control theory to HPC workloads. In week-long experiments, over 90 percent of jobs met deadlines and the jobs missing deadlines still finished close to the requested deadlines (12.4 percent error).
  • Keywords
    distributed processing; feedback; scheduling; virtual machines; admission control; batch scheduling; feedback control; high-performance computing; opportunistic multiplexing concurrent application; predictable HPC system; queue-time prediction; resource reservation; virtualized application layer; Multiprocessor systems; control theory.; parallel systems; scheduling;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2010.100
  • Filename
    5467065