• DocumentCode
    2284422
  • Title

    Fast similarity search in peer-to-peer networks

  • Author

    Bocek, Thomas ; Hunt, Ela ; Hausheer, David ; Stiller, Burkhard

  • Author_Institution
    Dept. of Inf. IFI, Univ. of Zurich, Zurich
  • fYear
    2008
  • fDate
    7-11 April 2008
  • Firstpage
    240
  • Lastpage
    247
  • Abstract
    Peer-to-peer (P2P) systems show numerous advantages over centralized systems, such as load balancing, scalability, and fault tolerance, and they require certain functionality, such as search, repair, and message and data transfer. In particular, structured P2P networks perform an exact search in logarithmic time proportional to the number of peers. However, keyword similarity search in a structured P2P network remains a challenge. Similarity search for service discovery can significantly improve service management in a distributed environment. As services are often described informally in text form, keyword similarity search can find the required services or data items more reliably. This paper presents a fast similarity search algorithm for structured P2P systems. The new algorithm, called P2P fast similarity search (P2PFastSS), finds similar keys in any distributed hash table (DHT) using the edit distance metric, and is independent of the underlying P2P routing algorithm. Performance analysis shows that P2PFastSS carries out a similarity search in time proportional to the logarithm of the number of peers. Simulations on PlanetLab confirm these results and show that a similarity search with 34,000 peers performs in less than three seconds on average. Thus, P2PFastSS is suitable for similarity search in large-scale network infrastructures, such as service description matching in service discovery or searching for similar terms in P2P storage networks.
  • Keywords
    cryptography; fault tolerance; file organisation; peer-to-peer computing; resource allocation; telecommunication network routing; P2P routing algorithm; P2PFastSS; PlanetLab; centralized systems; data transfer; distributed hash table; edit distance metric; fast similarity search; fault tolerance; keyword similarity search; load balancing; peer-to-peer networks; service discovery; Computer science; Data engineering; Electronic mail; Fault tolerant systems; Informatics; Laboratories; Load management; Peer to peer computing; Routing; Scalability; DHT; P2P; service discovery; similarity search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network Operations and Management Symposium, 2008. NOMS 2008. IEEE
  • Conference_Location
    Salvador, Bahia
  • ISSN
    1542-1201
  • Print_ISBN
    978-1-4244-2065-0
  • Electronic_ISBN
    1542-1201
  • Type

    conf

  • DOI
    10.1109/NOMS.2008.4575140
  • Filename
    4575140