Title :
Locating XML Documents in a Peer-to-Peer Network Using Distributed Hash Tables
Author :
Rao, Praveen R. ; Moon, Bongki
Author_Institution :
Dept. of Comput. Sci. & Electr. Eng., Univ. of Missouri-Kansas City, Kansas City, MO, USA
Abstract :
One of the key challenges in a peer-to-peer (P2P) network is to efficiently locate relevant data sources across a large number of participating peers. With the increasing popularity of the extensible markup language (XML) as a standard for information interchange on the Internet, XML is commonly used as an underlying data model for P2P applications to deal with the heterogeneity of data and enhance the expressiveness of queries. In this paper, we address the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath. We have developed a new system called psiX that runs on top of an existing distributed hashing framework. Under the psiX system, each XML document is mapped into an algebraic signature that captures the structural summary of the document. An XML query pattern is also mapped into a signature. The query´s signature is used to locate relevant document signatures. Our signature scheme supports holistic processing of query patterns without breaking them into multiple path queries and processing them individually. The participating peers in the network collectively maintain a collection of distributed hierarchical indexes for the document signatures. Value indexes are built to handle numeric and textual values in XML documents. These indexes are used to process queries with value predicates. Our experimental study on PlanetLab demonstrates that psiX provides an efficient location service in a P2P network for a wide variety of XML documents.
Keywords :
Internet; data models; database indexing; digital signatures; document handling; peer-to-peer computing; query languages; query processing; Internet; P2P network; XML document signature scheme; XML query pattern; algebraic signature; data model; distributed hash table; distributed hashing framework; distributed hierarchical index; extensible markup language; holistic processing; information interchange; multiple path query; numeric value; peer-to-peer network; psiX system; query language; textual value index; XML indexing; XPath; distributed hash tables.; peer-to-peer computing;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2009.26