• DocumentCode
    2396965
  • Title

    Study on the elimination of duplicated multimedia webpages

  • Author

    Yang, Xiaojuan

  • Author_Institution
    Sch. of Commun., Shandong Normal Univ., Jinan, China
  • fYear
    2012
  • fDate
    19-20 May 2012
  • Firstpage
    2325
  • Lastpage
    2328
  • Abstract
    There are many duplicated web pages in the multimedia web resources, and elimination of the duplicates can remove the duplicated pages, reduce storage costs and improve search engine performance. Based on analysis of the classic algorithm of eliminating the duplicates, his article raises an improved algorithm for judging web page text repetition. The new algorithm runs the elimination process on the basis of webpage contents which are used as the vector characteristics in the comparison with the webpages´ approximation, and analyzes how to capture the web page´s theme. Hence, we can make a multidimensional improvement in the elimination of the duplicates of multimedia webpages.
  • Keywords
    Web sites; information retrieval; multimedia systems; search engines; text analysis; Web page text repetition; Web page theme capturing; Webpage approximation; classic algorithm; duplicated multimedia Webpage elimination process; duplicated page removal; multimedia Web resources; search engine performance imorovement; storage cost reduction; vector characteristics; Accuracy; Algorithm design and analysis; Approximation algorithms; Feature extraction; Multimedia communication; Search engines; Web pages; Duplicates checking Algorithm; Feature code; Multimedia webpages; Topic search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems and Informatics (ICSAI), 2012 International Conference on
  • Conference_Location
    Yantai
  • Print_ISBN
    978-1-4673-0198-5
  • Type

    conf

  • DOI
    10.1109/ICSAI.2012.6223520
  • Filename
    6223520