Title :
Just-in-time staging of large input data for supercomputing jobs
Author :
Monti, Henry M. ; Butt, Ali R. ; Vazhkudai, Sudharshan S.
Author_Institution :
Virginia Tech, Blacksburg, VA
Abstract :
High performance computing is facing a data deluge from state-of-the-art colliders and observatories. Large data-sets from these facilities, and other end-user sites, are often inputs to intensive analyses on modern supercomputers. Timely staging in of input data at the supercomputer´s local storage can not only optimize space usage, but also protect against delays due to storage system failures. To this end, we propose a just-in-time staging framework that uses a combination of batch-queue predictions, user-specified intermediate nodes, and decentralized data delivery to coincide input data staging with job startup. Our preliminary prototype has been integrated with widely used tools such as the PBS job submission system, BitTorrent data delivery, and Network Weather Service network monitoring facility.
Keywords :
parallel machines; storage management; batch queue prediction; decentralized data delivery; end-user sites; high performance computing; just-in-time staging framework; large datasets; large input data; local storage; modern supercomputers; storage system failure; supercomputing jobs; user specified intermediate nodes; Computerized monitoring; Delay; File systems; Laboratories; Large Hadron Collider; Observatories; Protection; Prototypes; Supercomputers; Weather forecasting;
Conference_Titel :
Petascale Data Storage Workshop, 2008. PDSW '08. 3rd
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4244-4208-9
DOI :
10.1109/PDSW.2008.4811891