DocumentCode
2867070
Title
The Noise Reduction Method of Web Pages Based on Image Features
Author
Yao, Haitao ; Yin, Zhiyi ; Zhu, Fuxi ; Gong, Changsheng
Author_Institution
Sch. of Comput., Wuhan Univ., Wuhan, China
fYear
2009
fDate
11-13 Dec. 2009
Firstpage
1
Lastpage
5
Abstract
Same layer Webpage have similar presentation styles and noise blocks. The first step of data mining is to remove noise blocks from Web pages. Different from traditional similarity measurement method based on DOM trees, a noise removal method based on image features is proposed in this paper. In this method, Web pages are processed as images. And then, all of image features can be flexibly used as criteria to measure similarity of noise blocks. As a result, noise blocks and information blocks can be distinguished after measuring similarity, and the reduction of noise is realized. The results of experiment demonstrate that this method is accurate and reliable and it can support joint measurement of multiple image features.
Keywords
Internet; data mining; document image processing; Web pages; data mining; image feature; information block; noise block; noise reduction method; noise removal method; Cleaning; Data mining; HTML; Information analysis; Internet; Navigation; Noise measurement; Noise reduction; Web pages; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-4507-3
Electronic_ISBN
978-1-4244-4507-3
Type
conf
DOI
10.1109/CISE.2009.5366410
Filename
5366410
Link To Document