• DocumentCode
    2867070
  • Title

    The Noise Reduction Method of Web Pages Based on Image Features

  • Author

    Yao, Haitao ; Yin, Zhiyi ; Zhu, Fuxi ; Gong, Changsheng

  • Author_Institution
    Sch. of Comput., Wuhan Univ., Wuhan, China
  • fYear
    2009
  • fDate
    11-13 Dec. 2009
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Same layer Webpage have similar presentation styles and noise blocks. The first step of data mining is to remove noise blocks from Web pages. Different from traditional similarity measurement method based on DOM trees, a noise removal method based on image features is proposed in this paper. In this method, Web pages are processed as images. And then, all of image features can be flexibly used as criteria to measure similarity of noise blocks. As a result, noise blocks and information blocks can be distinguished after measuring similarity, and the reduction of noise is realized. The results of experiment demonstrate that this method is accurate and reliable and it can support joint measurement of multiple image features.
  • Keywords
    Internet; data mining; document image processing; Web pages; data mining; image feature; information block; noise block; noise reduction method; noise removal method; Cleaning; Data mining; HTML; Information analysis; Internet; Navigation; Noise measurement; Noise reduction; Web pages; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-4507-3
  • Electronic_ISBN
    978-1-4244-4507-3
  • Type

    conf

  • DOI
    10.1109/CISE.2009.5366410
  • Filename
    5366410