• DocumentCode
    537585
  • Title

    Clustering Web Retrieval Results Accompanied by Removing Duplicate Documents

  • Author

    Li, Xinye ; Yang, Qinhai ; Zeng, LinNa

  • Author_Institution
    Sch. of Electr. & Electron. Eng., North China Electr. Power Univ., Baoding, China
  • Volume
    1
  • fYear
    2010
  • fDate
    23-24 Oct. 2010
  • Firstpage
    259
  • Lastpage
    261
  • Abstract
    Since keyword-based search engine usually return large amount of results in which there are many unrelated documents and many documents with same content, automatic clustering technology is used to classify the retrieval results. While there are large amount of Web retrieval results, the clustering process usually costs long time and the clusters are not friendly to users since there are still many documents with same content. This paper proposed an improved clustering method by removing the duplicate documents from retrieval results. The removal operation is executed first in initial partition stage during clustering. Then it is executed again after the initial partition stage to remove the duplicate documents thoroughly. We proposed an efficient removal method in this stage. At last, we made experiment to verify our method.
  • Keywords
    Internet; document handling; pattern clustering; search engines; Web retrieval results; automatic clustering; clustering process; duplicate documents; keyword based search engine; unrelated documents; clustering; duplicate documents; k-means; web retrieval result;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information Systems and Mining (WISM), 2010 International Conference on
  • Conference_Location
    Sanya
  • Print_ISBN
    978-1-4244-8438-6
  • Type

    conf

  • DOI
    10.1109/WISM.2010.115
  • Filename
    5662322