Title :
Zest Checkpoint storage system for large supercomputers
Author :
Nowoczynski, Paul ; Stone, Nathan ; Yanovich, Jared ; Sommerfield, Jason
Author_Institution :
Pittsburgh Supercomput. Center, Pittsburgh, PA
Abstract :
The PSC has developed a prototype distributed file system infrastructure that vastly accelerates aggregated write bandwidth on large compute platforms. Write bandwidth, more than read bandwidth, is the dominant bottleneck in HPC I/O scenarios due to writing checkpoint data, visualization data and post-processing (multi-stage) data. We have prototyped a scalable solution that will be directly applicable to future petascale compute platforms having of order 10^6 cores. Our design emphasizes high-efficiency scalability, low-cost commodity components, lightweight software layers, end-to-end parallelism, client-side caching and software parity, and a unique model of load-balancing outgoing I/O onto high-speed intermediate storage followed by asynchronous reconstruction to a 3rd-party parallel file system.
Keywords :
checkpointing; data visualisation; input-output programs; mainframes; parallel processing; program verification; resource allocation; HPC I-O scenarios; asynchronous reconstruction; checkpoint storage system; client-side caching; data checkpoint; data visualization; end-to-end parallelism; high-speed intermediate storage; load-balancing; parallel file system; petascale compute platforms; post-processing data; prototype distributed file system infrastructure; software layers; software parity; Acceleration; Bandwidth; Data visualization; Distributed computing; File systems; Petascale computing; Prototypes; Software prototyping; Supercomputers; Writing; Client-side Raid; High-performance commodity storage; Parallel Application Checkpoint; Parallel I/O; Petascale Storage; Terabytes per second; log-structured filesystems;
Conference_Titel :
Petascale Data Storage Workshop, 2008. PDSW '08. 3rd
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4244-4208-9
DOI :
10.1109/PDSW.2008.4811883