مرکز منطقه ای اطلاع رساني علوم و فناوري - Trapped Capacity: Scheduling under a Power Cap to Maximize Machine-Room Throughput

DocumentCode :

233727

Title :

Trapped Capacity: Scheduling under a Power Cap to Maximize Machine-Room Throughput

Author :

Ziming Zhang ; Lang, Michael ; Pakin, Scott ; Song Fu

Author_Institution :

Dept. of Comput. Sci. & Eng., Univ. of North Texas, Denton, TX, USA

fYear :

2014

fDate :

16-16 Nov. 2014

Firstpage :

Lastpage :

Abstract :

Power-aware parallel job scheduling has been recognized as a demanding issue in the high-performance computing (HPC) community. The goal is to efficiently allocate and utilize power and energy in machine rooms. In practice the power for machine rooms is well over-provisioned, specified by high energy LINPACK runs or nameplate power estimates. This results in a considerable amount of trapped power capacity. Instead of being wasted, this trapped power capacity should be reclaimed to accommodate more compute nodes in the machine room and thereby increase system throughput. But to do this we need the ability to enforce a system-wide power cap. In this paper, we present TracSim, a full-system simulator that enables users to evaluate the performance of different policies for scheduling parallel tasks under a power cap. TracSim simulates the executing environment of an HPC cluster at Los Alamos National Laboratory (LANL). We use real measurements from the LANL cluster to set the configuration parameters of TracSim. TracSim enables users to specify the system topology, hardware configuration, power cap, and task workload, and to develop resource configuration and task scheduling policies aiming to maximize machine-room throughput while keeping power consumption under a power cap by exploiting CPU throttling techniques. We leverage TracSim to implement and evaluate three resource scheduling policies. Simulation results show the performance of those policies and quantify the amount of trapped capacity that can effectively be reclaimed.

Keywords :

parallel processing; power aware computing; scheduling; CPU throttling techniques; Los Alamos national laboratory; TracSim; full-system simulator; high energy LINPACK; high-performance computing community; machine-room throughput; power-aware parallel job scheduling; resource configuration; system-wide power cap; task scheduling policies; trapped power capacity; Hardware; Power demand; Power measurement; Processor scheduling; Production; Resource management; Throughput;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Energy Efficient Supercomputing Workshop (E2SC), 2014

Conference_Location :

New Orleans, LA

Type :

conf

DOI :

10.1109/E2SC.2014.10

Filename :

7016386

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=233727