DocumentCode :
2534480
Title :
Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability
Author :
Andrews, Phil ; Kovatch, Patricia ; Hazlewood, Victor ; Baer, Troy
Author_Institution :
Nat. Inst. for Comput. Sci., U. of Tennessee, Oak Ridge, TN, USA
fYear :
2010
fDate :
13-16 Sept. 2010
Firstpage :
421
Lastpage :
427
Abstract :
In late 2009, the National Institute for Computational Sciences placed in production the world´s fastest academic supercomputer (third overall), a Cray XT5 named Kraken, with almost 100,000 compute cores and a peak speed in excess of one Petaflop. Delivering over 50% of the total cycles available to the National Science Foundation users via the TeraGrid, Kraken has two missions that have historically proven difficult to simultaneously reconcile: providing the maximum number of total cycles to the community, while enabling full machine runs for “hero” users. Historically, this has been attempted by allowing schedulers to choose the correct time for the beginning of large jobs, with a concomitant reduction in utilization. At NICS, we used the results of a previous theoretical investigation to adopt a different approach, where the “clearing out” of the system is forced on a weekly basis, followed by consecutive full machine runs. As our previous simulation results suggested, this lead to a significant improvement in utilization, to over 90%. The difference in utilization between the traditional and adopted scheduling policies was the equivalent of a 300+ Teraflop supercomputer, or several million dollars of compute time per year.
Keywords :
parallel machines; processor scheduling; Cray XT5; Kraken; Petaflop; TeraGrid; Teraflop supercomputer; core supercomputer scheduling; high-performance computing; Aggregates; Processor scheduling; Production; Resource management; Runtime; Supercomputers; High-performance computing; scheduling; systems software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing Workshops (ICPPW), 2010 39th International Conference on
Conference_Location :
San Diego, CA
ISSN :
1530-2016
Print_ISBN :
978-1-4244-7918-4
Electronic_ISBN :
1530-2016
Type :
conf
DOI :
10.1109/ICPPW.2010.63
Filename :
5599101
Link To Document :
بازگشت