DocumentCode :
3268977
Title :
Multiresolution indexing of XML for frequent queries
Author :
He, Hao ; Yang, Jun
Author_Institution :
Dept. of Comput. Sci., Duke Univ., Durham, NC, USA
fYear :
2004
fDate :
30 March-2 April 2004
Firstpage :
683
Lastpage :
694
Abstract :
XML and other types of semistructured data are typically represented by a labeled directed graph. To speed up path expression queries over the graph, a variety of structural indexes have been proposed. They usually work by partitioning nodes in the data graph into equivalence classes and storing equivalence classes as index nodes. A(k)-index introduces the concept of local bisimilarity for partitioning, allowing the trade-off between index size and query answering power. However, all index nodes in A(k)-index have the same local similarity k, which cannot take advantage of the fact that a workload may contain path expressions of different lengths, or that different parts of the data graph may have different local similarity requirements. To overcome these limitations, we propose M(k)- and M*(k)-indexes. The basic M(k)-index is workload-aware: Like the previously proposed D(k)-index, it allows different index nodes to have different local similarity requirements, providing finer partitioning only for parts of the data graph targeted by longer path expressions. Unlike D(k)-index, M(k)-index is never over-refined for irrelevant index or data nodes. However, the workload-aware feature still incurs overrefinement due to over-qualified parent index nodes. Moreover, fine partitions penalize the performance of short path expressions. To solve these problems, we further propose the M*(k)-index. An M*(k)-index consists of a collection of indexes whose nodes are organized in a partition hierarchy, allowing successively coarser partitioning information to co-exist with the finest partitioning information required. Experiments show that our indexes are superior to previously proposed indexes in terms of index size and query performance.
Keywords :
XML; bisimulation equivalence; data structures; database indexing; directed graphs; equivalence classes; query processing; A(k)-index; M(k)-index; XML; data graph; equivalence classes; expression queries; frequent queries; labeled directed graph; local bisimilarity; multiresolution indexing; over-qualified parent index node; partitioning nodes; path expression; query answering power; semistructured data; short path expression; structural index; workload-aware feature; Computer science; Data engineering; Data models; Database languages; Engineering profession; Helium; Indexing; Internet; Query processing; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2004. Proceedings. 20th International Conference on
ISSN :
1063-6382
Print_ISBN :
0-7695-2065-0
Type :
conf
DOI :
10.1109/ICDE.2004.1320037
Filename :
1320037
Link To Document :
بازگشت