• DocumentCode
    1186216
  • Title

    Exploring Correlated Subspaces for Efficient Query Processing in Sparse Databases

  • Author

    Cui, Bin ; Zhao, Jiakui ; Yang, Dongqing

  • Author_Institution
    Key Lab. of High Confidence Software Technol., Peking Univ., Beijing, China
  • Volume
    22
  • Issue
    2
  • fYear
    2010
  • Firstpage
    219
  • Lastpage
    233
  • Abstract
    Sparse data are becoming increasingly common and available in many real-life applications. However, relatively little attention has been paid to effectively model the sparse data and existing approaches such as the conventional "horizontal?? and "vertical?? representations fail to provide satisfactory performance for both storage and query processing, as such approaches are too rigid and generally do not consider the dimension correlations. In this paper, we propose a new approach, named HoVer, to store and conduct query for sparse data sets in an unmodified RDBMS, where HoVer stands for horizontal representation over vertically partitioned subspaces. According to the dimension correlations of sparse data sets, a novel mechanism has been developed to vertically partition a high-dimensional sparse data set into multiple lower-dimensional subspaces, and all the dimensions are highly correlated intrasubspace and highly unrelated intersubspace, respectively. Therefore, original data objects can be represented by the horizontal format in respective subspaces. With the novel HoVer representation, users can write SQL queries over the original horizontal view, which can be easily rewritten into queries over the subspace tables. Experiments over synthetic and real-life data sets show that our approach is effective in finding correlated subspaces and yields superior performance for the storage and query of sparse data.
  • Keywords
    data structures; database management systems; query processing; HoVer; SQL queries; data representation; high-dimensional sparse data set; multiple lower-dimensional subspaces; query processing; sparse databases; unmodified RDBMS; HoVer.; Sparse database; correlation; query processing; subspace;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2009.66
  • Filename
    4798168