DocumentCode
580131
Title
PLFS: a checkpoint filesystem for parallel applications
Author
Bent, John ; Gibson, Garth ; Grider, Gary ; McClelland, B. ; Nowoczynski, P. ; Nunez, Juan ; Polte, M. ; Wingate, M.
Author_Institution
LANL Tech. Inf., Los Alamos Nat. Lab., Los Alamos, NM, USA
fYear
2009
fDate
14-20 Nov. 2009
Firstpage
1
Lastpage
12
Abstract
Parallel applications running across thousands of processors must protect themselves from inevitable system failures. Many applications insulate themselves from failures by checkpointing. For many applications, checkpointing into a shared single file is most convenient. With such an approach, the size of writes are often small and not aligned with file system boundaries. Unfortunately for these applications, this preferred data layout results in pathologically poor performance from the underlying file system which is optimized for large, aligned writes to non-shared files. To address this fundamental mismatch, we have developed a virtual parallel log structured file system, PLFS. PLFS remaps an application´s preferred data layout into one which is optimized for the underlying file system. Through testing on PanFS, Lustre, and GPFS, we have seen that this layer of indirection and reorganization can reduce checkpoint time by an order of magnitude for several important benchmarks and real applications without any application modification.
Keywords
checkpointing; file organisation; parallel processing; GPFS; Lustre; PLFS; PanFS; filesystem checkpointing; parallel application; virtual parallel log structured file system; check-pointing; high performance computing; parallel computing; parallel file systems and IO;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing Networking, Storage and Analysis, Proceedings of the Conference on
Conference_Location
Portland, OR
Type
conf
DOI
10.1145/1654059.1654081
Filename
6375580
Link To Document