• DocumentCode
    3268977
  • Title

    Multiresolution indexing of XML for frequent queries

  • Author

    He, Hao ; Yang, Jun

  • Author_Institution
    Dept. of Comput. Sci., Duke Univ., Durham, NC, USA
  • fYear
    2004
  • fDate
    30 March-2 April 2004
  • Firstpage
    683
  • Lastpage
    694
  • Abstract
    XML and other types of semistructured data are typically represented by a labeled directed graph. To speed up path expression queries over the graph, a variety of structural indexes have been proposed. They usually work by partitioning nodes in the data graph into equivalence classes and storing equivalence classes as index nodes. A(k)-index introduces the concept of local bisimilarity for partitioning, allowing the trade-off between index size and query answering power. However, all index nodes in A(k)-index have the same local similarity k, which cannot take advantage of the fact that a workload may contain path expressions of different lengths, or that different parts of the data graph may have different local similarity requirements. To overcome these limitations, we propose M(k)- and M*(k)-indexes. The basic M(k)-index is workload-aware: Like the previously proposed D(k)-index, it allows different index nodes to have different local similarity requirements, providing finer partitioning only for parts of the data graph targeted by longer path expressions. Unlike D(k)-index, M(k)-index is never over-refined for irrelevant index or data nodes. However, the workload-aware feature still incurs overrefinement due to over-qualified parent index nodes. Moreover, fine partitions penalize the performance of short path expressions. To solve these problems, we further propose the M*(k)-index. An M*(k)-index consists of a collection of indexes whose nodes are organized in a partition hierarchy, allowing successively coarser partitioning information to co-exist with the finest partitioning information required. Experiments show that our indexes are superior to previously proposed indexes in terms of index size and query performance.
  • Keywords
    XML; bisimulation equivalence; data structures; database indexing; directed graphs; equivalence classes; query processing; A(k)-index; M(k)-index; XML; data graph; equivalence classes; expression queries; frequent queries; labeled directed graph; local bisimilarity; multiresolution indexing; over-qualified parent index node; partitioning nodes; path expression; query answering power; semistructured data; short path expression; structural index; workload-aware feature; Computer science; Data engineering; Data models; Database languages; Engineering profession; Helium; Indexing; Internet; Query processing; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2004. Proceedings. 20th International Conference on
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-2065-0
  • Type

    conf

  • DOI
    10.1109/ICDE.2004.1320037
  • Filename
    1320037