DocumentCode :
2046108
Title :
Optimizing Near Duplicate Detection for P2P Networks
Author :
Papapetrou, Odysseas ; Ramesh, Sukriti ; Siersdorfer, Stefan ; Nejdl, Wolfgang
Author_Institution :
L3S Res. Center, Hannover, Germany
fYear :
2010
fDate :
25-27 Aug. 2010
Firstpage :
1
Lastpage :
10
Abstract :
In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To this end, we present a thorough cost and probabilistic analysis that allows the algorithm to adapt to network and data collection characteristics for minimizing network cost. In addition, we extend the algorithm so that it can identify similar videos, even if some of the videos are split into different files. A thorough theoretical analysis as well as a large-scale experimental evaluation on networks of up to 100,000 peers using real-world datasets of more than 200 Gbytes demonstrate the viability of our approach.
Keywords :
optimisation; peer-to-peer computing; probability; audio resources; data collection characteristics; large-scale P2P networks; near duplicate detection optimization; network cost minimization; probabilistic algorithm; real-world datasets; video resources; Algorithm design and analysis; Couplings; Indexing; Optimization; Peer to peer computing; Probabilistic logic;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Peer-to-Peer Computing (P2P), 2010 IEEE Tenth International Conference on
Conference_Location :
Delft
Print_ISBN :
978-1-4244-7140-9
Electronic_ISBN :
978-1-4244-7139-3
Type :
conf
DOI :
10.1109/P2P.2010.5570001
Filename :
5570001
Link To Document :
بازگشت