• DocumentCode
    657395
  • Title

    Increasing ranked accuracy for unstructured distributed search with dynamic replication

  • Author

    Richardson, S. ; Cox, I.J.

  • Author_Institution
    Dept. of Comput. Sci., Univ. Coll. London, London, UK
  • fYear
    2013
  • fDate
    9-11 Sept. 2013
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    In unstructured peer-to-peer networks it is necessary to query every node to guarantee finding a particular document. This is usually not feasible, so search is probabilistic. In previous work the metric of rank-accuracy was proposed to measure the performance of a query. It compares the set of documents retrieved from a search of a random subset of nodes to the set that would have been retrieved from an exhaustive search of all nodes, assigning a higher weight to higher ranked documents. It was shown that rank-accuracy can be significantly increased by replicating documents across nodes in proportion to their weighted retrieval rate, a function of query frequency and rank. However, this work assumed that the query distribution is known and unchanging, both of which are seldom true for practical systems. In this paper we make no such assumptions. We investigate how the popularity- and rank-aware distribution of documents can be approximated by replicating documents as queries are made. Using a simulated network of 10,000 nodes and 10,000,000 queries drawn from a Yahoo! web search engine log, we show that such a scheme can achieve over a 450% increase in overall rank-accuracy when compared to a uniform random distribution of documents. We also show this increase can be improved up to about 650% when the dynamically replicated documents are pruned down to just the terms that appear in queries.
  • Keywords
    Internet; document handling; peer-to-peer computing; query processing; search engines; Yahoo! Web search engine log; dynamic document replication; popularity-aware distribution; query distribution; query frequency; rank accuracy; rank-aware distribution; unstructured distributed search; unstructured peer-to-peer networks; Accuracy; Conferences; Educational institutions; Indexes; Peer-to-peer computing; Probabilistic logic; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Peer-to-Peer Computing (P2P), 2013 IEEE Thirteenth International Conference on
  • Conference_Location
    Trento
  • Type

    conf

  • DOI
    10.1109/P2P.2013.6688716
  • Filename
    6688716