DocumentCode
2828171
Title
Detection and optimized disposal of near-duplicate pages
Author
Qiu, Junping ; Zeng, Qian
Author_Institution
Coll. of Inf. Manage., Wuhan Univ., Wuhan, China
Volume
2
fYear
2010
fDate
21-24 May 2010
Abstract
Search engine is an important tool for users to access network information resources. However, a large number of duplicate and near-duplicate pages added user´s burden. Currently, search engines only remove duplicate pages, but have not yet any effective strategies in detecting and disposing near-duplicate pages. This paper analyzed the existing algorithms to select an appropriate algorithm to detect near-duplicate pages, and optimized the disposing strategy to ensure that near-duplicate pages would not take up too much space in search results while being used effectively. These will allow users to retrieve needed information more easily.
Keywords
search engines; near-duplicate pages detection; near-duplicate pages disposal; search engine; Algorithm design and analysis; Clustering algorithms; Educational institutions; Frequency; Information management; Information resources; Information retrieval; Search engines; Uniform resource locators; Web pages; Duplicate Detection; Information retrieval; Near-Duplicate; Ranking algorithm; Search Engine;
fLanguage
English
Publisher
ieee
Conference_Titel
Future Computer and Communication (ICFCC), 2010 2nd International Conference on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-5821-9
Type
conf
DOI
10.1109/ICFCC.2010.5497544
Filename
5497544
Link To Document