• DocumentCode
    2121407
  • Title

    HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets using Fast Bitmap Indices

  • Author

    Gosink, Luke ; Shalf, John ; Stockinger, Kurt ; Wu, Kesheng ; Bethel, Wes

  • Author_Institution
    Inst. for Data Anal. & Visualization, California Univ., Davis, CA
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    149
  • Lastpage
    158
  • Abstract
    Large scale scientific data is often stored in scientific data formats such as FITS, netCDF and HDF. These storage formats are of particular interest to the scientific user community since they provide multi-dimensional storage and retrieval. However, one of the drawbacks of these storage formats is that they do not support semantic indexing which is important for interactive data analysis where scientists look for features of interests such as "find all supernova explosions where energy > 105 and temperature > 106". In this paper we present a novel approach called HDF5-FastQuery to accelerate the data access of large HDF5 files by introducing multi-dimensional semantic indexing. Our implementation leverages an efficient indexing technology called bitmap indexing that has been widely used in the database community. Bitmap indices are especially well suited for interactive exploration of large-scale read-only data. Storing the bitmap indices into the HDF5 file has the following advantages: a) significant performance speedup of accessing subsets of multi-dimensional data and b) portability of the indices across multiple computer platforms. We present an API that simplifies the execution of queries on HDF5 files for general scientific applications and data analysis. The design is flexible enough to accommodate the use of arbitrary indexing technology for semantic range queries. We also provide a detailed performance analysis of HDF5-FastQuery for both synthetic and scientific data. The results demonstrate that our proposed approach for multi-dimensional queries is up to a factor of 2 faster than HDF5
  • Keywords
    application program interfaces; data analysis; database indexing; query processing; scientific information systems; API; HDF5-FastQuery; application program interface; bitmap indices; interactive data analysis; multidimensional retrieval; multidimensional semantic indexing; multidimensional storage; query processing; scientific data format; semantic indexing; semantic range query; Acceleration; Application software; Data analysis; Databases; Energy storage; Explosions; Indexing; Large-scale systems; Performance analysis; Temperature;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Scientific and Statistical Database Management, 2006. 18th International Conference on
  • Conference_Location
    Vienna
  • ISSN
    1551-6393
  • Print_ISBN
    0-7695-2590-3
  • Type

    conf

  • DOI
    10.1109/SSDBM.2006.27
  • Filename
    1644309