• DocumentCode
    3294314
  • Title

    Similarity Searching in Peer-to-Peer Databases

  • Author

    Bhattacharya, Indrajit ; Kashyap, Srinivas R. ; Parthasarathy, Srinivasan

  • Author_Institution
    Dept. of Comput. Sci., Maryland Univ., College Park, MD
  • fYear
    2005
  • fDate
    10-10 June 2005
  • Firstpage
    329
  • Lastpage
    338
  • Abstract
    We consider the problem of handling similarity queries in peer-to-peer databases. We propose an indexing and searching mechanism which, given a query object, returns the set of objects in the database that are semantically related to the query. We propose an indexing scheme which clusters data such that semantically related objects are partitioned into a small set of clusters, allowing for a simple and efficient similarity search strategy. Our indexing scheme also decouples object and node locations. Our adaptive replication and randomized lookup schemes exploit this feature and ensure that the number of copies of an object is proportional to its popularity and all replicas are equally likely to serve a given query, thus achieving perfect load balancing. The techniques developed in this work are oblivious to the underlying DHT topology and can be implemented on a variety of structured overlays such as CAN, CHORD, Pastry, and Tapestry. We also present DHT-independent analytical guarantees for the performance of our algorithms in terms of search accuracy, cost, and load-balance; the experimental results from our simulations confirm the insights derived from these analytical models
  • Keywords
    database indexing; peer-to-peer computing; query formulation; query processing; CAN; CHORD; DHT topology; Pastry; Tapestry; adaptive replication; data clustering; data indexing; load balancing; peer-to-peer databases; query object; randomized lookup; similarity query handling; similarity search strategy; similarity searching; Analytical models; Computer science; Databases; Educational institutions; Indexing; Information retrieval; Load management; Peer to peer computing; Performance analysis; Topology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems, 2005. ICDCS 2005. Proceedings. 25th IEEE International Conference on
  • Conference_Location
    Columbus, OH
  • ISSN
    1063-6927
  • Print_ISBN
    0-7695-2331-5
  • Type

    conf

  • DOI
    10.1109/ICDCS.2005.74
  • Filename
    1437096