• DocumentCode
    2828171
  • Title

    Detection and optimized disposal of near-duplicate pages

  • Author

    Qiu, Junping ; Zeng, Qian

  • Author_Institution
    Coll. of Inf. Manage., Wuhan Univ., Wuhan, China
  • Volume
    2
  • fYear
    2010
  • fDate
    21-24 May 2010
  • Abstract
    Search engine is an important tool for users to access network information resources. However, a large number of duplicate and near-duplicate pages added user´s burden. Currently, search engines only remove duplicate pages, but have not yet any effective strategies in detecting and disposing near-duplicate pages. This paper analyzed the existing algorithms to select an appropriate algorithm to detect near-duplicate pages, and optimized the disposing strategy to ensure that near-duplicate pages would not take up too much space in search results while being used effectively. These will allow users to retrieve needed information more easily.
  • Keywords
    search engines; near-duplicate pages detection; near-duplicate pages disposal; search engine; Algorithm design and analysis; Clustering algorithms; Educational institutions; Frequency; Information management; Information resources; Information retrieval; Search engines; Uniform resource locators; Web pages; Duplicate Detection; Information retrieval; Near-Duplicate; Ranking algorithm; Search Engine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Future Computer and Communication (ICFCC), 2010 2nd International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-5821-9
  • Type

    conf

  • DOI
    10.1109/ICFCC.2010.5497544
  • Filename
    5497544