• DocumentCode
    2405775
  • Title

    Efficient structured data access in parallel file systems

  • Author

    Ching, Avery ; Choudhary, Alok ; Liao, Wei-keng ; Ross, Robert ; Gropp, William

  • Author_Institution
    Electr. & Comput. Eng. Dept., Northwestern Univ., Evanston, IL, USA
  • fYear
    2003
  • fDate
    1-4 Dec. 2003
  • Firstpage
    326
  • Lastpage
    335
  • Abstract
    Parallel scientific applications store and retrieve very large, structured datasets. Directly supporting these structured accesses is an important step in providing high-performance I/O solutions for these applications. High-level interfaces such as HDF5 and Parallel netCDF provide convenient APIs for accessing structured datasets, and the MPI-IO interface also supports efficient access to structured data. However, parallel file systems do not traditionally support such access. In this work we present an implementation of structured data access support in the context of the parallel virtual file system (PVFS). We call this support "datatype I/O" because of its similarity to MPI datatypes. This support is built by using a reusable datatype-processing component from the MPICH2 MPI implementation. We describe how this component is leveraged to efficiently process structured data representations resulting from MPI-IO operations. We quantitatively assess the solution using three test applications. We also point to further optimizations in the processing path that could be leveraged for even more efficient operation.
  • Keywords
    application program interfaces; data structures; distributed databases; message passing; network operating systems; very large databases; API; HDF5; I/O solutions; MPI datatypes; MPI-IO interface; MPI-IO operations; MPICH2 MPI; PVFS; Parallel netCDF; datatype I/O; datatype-processing component; high-level interfaces; parallel file systems; parallel scientific applications; parallel virtual file system; structured data access; structured data representations; structured dataset retrieval; structured dataset storage; Application software; Computer science; Concurrent computing; Data engineering; Data structures; Database systems; Distributed database management systems; File systems; Information retrieval; Laboratories; Libraries; Mathematics; Message passing; Network operating systems; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing, 2003. Proceedings. 2003 IEEE International Conference on
  • Print_ISBN
    0-7695-2066-9
  • Type

    conf

  • DOI
    10.1109/CLUSTR.2003.1253331
  • Filename
    1253331