Title :
A Clustered Index Approach to Distributed XPath Processing
Author :
Koloniari, Georgia ; Pitoura, Evaggelia
Author_Institution :
Comput. Sci. Dept., Univ. of Ioannina, Ioannina
Abstract :
Supporting top-k queries over distributed collections of schemaless XML data poses two challenges. While XML supports expressive query languages such as XPath and XQuery, these languages require schema knowledge so as to write an appropriate query which may not be available in distributed systems with autonomous and dynamic sources. Thus, there is a need for approximate query processing. Furthermore, retrieving the top-k results incurs large communication and processing cost, since partial result lists from numerous sites need to be combined and ranked to assembly the top-k answers. To address both of these issues, we present an approach for approximate XPath processing over distributed collections of XML data based on a clustered path index, where data is grouped based on structural information. Our method gradually generalizes a query by applying a set of structural transformations to it and the retrieved results are ranked based on the edit distance between two path expressions. A compact indexing data structure is used to reduce the index construction cost. Our experimental results show that our approach significantly reduces the communication cost for retrieving the top-k results, while maintaining a low construction cost for the clustered index.
Keywords :
XML; data structures; distributed processing; pattern clustering; query languages; XQuery; clustered index approach; clustered path index; distributed XPath processing; index construction cost; indexing data structure; query languages; schemaless XML data; structural information; top-k queries; Assembly; Computer science; Costs; Database languages; Distributed computing; Information retrieval; Phase measurement; Query processing; Routing; XML;
Conference_Titel :
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4244-1836-7
Electronic_ISBN :
978-1-4244-1837-4
DOI :
10.1109/ICDE.2008.4497608