DocumentCode
2428150
Title
A fault tolerant MPI-IO implementation using the Expand parallel file system
Author
Calderón, A. ; García-Carballeira, F. ; Carretero, J. ; Pérez, J.M. ; Sánchez, L.M.
Author_Institution
Comput. Sci. Dept., Univ. Carlos III de Madrid, Spain
fYear
2005
fDate
9-11 Feb. 2005
Firstpage
274
Lastpage
281
Abstract
Parallelism in file systems is obtained by using several independent server nodes supporting one or more secondary storage devices. This approach increases the performance and scalability of the system, but a fault in one single node can stop the whole system. To avoid this problem, data must be stored using some kind of redundant technique, so any data stored in a faulty element can be recovered. Fault tolerance can be provided in I/O systems using replication or RAID based schemes. However, most of the current systems apply the same technique for all files in the system. This paper describes the fault tolerance support provided by Expand, a parallel file system based on standard servers. Expand allows to define different fault-tolerant mechanisms at file level. The evaluation compares the performance of Expand with different configurations with PVFS using the FLASH-I/O benchmark.
Keywords
application program interfaces; message passing; network operating systems; software fault tolerance; I/O system; MPI; RAID; data declustering; fault tolerance; fault tolerant MPI-IO; message passing interface; parallel file system; replication scheme; Computer architecture; Contracts; Fault tolerance; Fault tolerant systems; File servers; File systems; Parallel processing; Redundancy; Scalability; Storage area networks; Fault-Tolerance; NFS; Parallel File System; clusters; data declustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel, Distributed and Network-Based Processing, 2005. PDP 2005. 13th Euromicro Conference on
ISSN
1066-6192
Print_ISBN
0-7695-2280-7
Type
conf
DOI
10.1109/EMPDP.2005.3
Filename
1386069
Link To Document