• DocumentCode
    1916646
  • Title

    Improving Data Analysis Performance for High-Performance Computing with Integrating Statistical Metadata in Scientific Datasets

  • Author

    Jialin Liu ; Yong Chen

  • Author_Institution
    Comput. Sci. Dept., Texas Tech Univ., Lubbock, TX, USA
  • fYear
    2012
  • fDate
    10-16 Nov. 2012
  • Firstpage
    1292
  • Lastpage
    1295
  • Abstract
    Scientific datasets and libraries, such as HDF5, ADIOS, and NetCDF, have been used widely in many data intensive applications. These libraries have their special file formats and I/O functions to provide efficient access to large datasets. When the data size keeps increasing, these high level I/O libraries face new challenges. Recent studies have started to utilize database techniques such as indexing and subsetting, and data reorganization to manage the increasing datasets. In this work, we present a new approach to boost the data analysis performance, namely Fast Analysis with Statistical Metadata (FASM), via data subsetting and integrating a small amount of statistics into the original datasets. The added statistical information illustrates the data shape and provides knowledge of the data distribution; therefore the original I/O libraries can utilize these statistical metadata to perform fast queries and analyses. The proposed FASM approach is currently evaluated with the PnetCDF on Lustre file systems, but can also be implemented with other scientific libraries. The FASM can potentially lead to a new dataset design and can have an impact on big data analysis.
  • Keywords
    data analysis; meta data; parallel processing; statistical analysis; ADIOS library; FASM approach; HDF5 library; Lustre file system; NetCDF library; data analysis performance; data reorganization; database technique; high-performance computing; indexing technique; input-ouput function; scientific dataset; statistical information; statistical metadata; subsetting technique; FASM; big data; data intensive computing; high performance computing; statistical techniques; storage systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
  • Conference_Location
    Salt Lake City, UT
  • Print_ISBN
    978-1-4673-6218-4
  • Type

    conf

  • DOI
    10.1109/SC.Companion.2012.156
  • Filename
    6495938