• DocumentCode
    668124
  • Title

    Thermal aware automated load balancing for HPC applications

  • Author

    Menon, Harshitha ; Acun, Bilge ; De Gonzalo, Simon Garcia ; Sarood, Osman ; Kale, Laxmikant

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
  • fYear
    2013
  • fDate
    23-27 Sept. 2013
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    As we move towards the exascale era, power and energy have become major challenges. Some of the supercomputers draw more than 10 megawatts, leading to high energy bills. A significant portion of this energy is spent in cooling. In this paper, we propose an adaptive control system that minimizes the cooling energy by using Dynamic Voltage and Frequency Scaling to control the temperature and performing load balancing. This framework, which is a part of the adaptive runtime system, monitors the system and application characteristics and triggers mechanism to limit the temperature. It also performs load balancing whenever imbalance is detected and load balancing is beneficial. We demonstrate, using a set of applications and benchmarks, that the proposed framework can control the temperature of the cores effectively and reduce the timing penalty automatically without any support from the user.
  • Keywords
    adaptive control; parallel processing; power aware computing; resource allocation; temperature control; HPC applications; adaptive control system; adaptive runtime system; cooling energy minimization; dynamic voltage and frequency scaling; exascale era; high performance computing; supercomputers; temperature control; thermal aware automated load balancing; Cooling; Heating; Load management; Runtime; Temperature; Timing; automated; dvfs; energy consumption; load balancing; parallel applications; run-time system;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2013 IEEE International Conference on
  • Conference_Location
    Indianapolis, IN
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2013.6702627
  • Filename
    6702627