Author_Institution :
Dept. of Comput., Xinzhou Teachers Univ., Xinzhou, China
Abstract :
Notice of Violation of IEEE Publication Principles
"Study to Eliminating Noisy Information in Web Pages based on Data Mining"
by G. Hu and Q. Zhao
in the 2010 Sixth International Conference on Natural Computation, pages: 660-663, August 2010.
After careful and considered review of the content and authorship of this paper by a duly constituted expert committee, this paper has been found to be in violation of IEEE\´s Publication Principles.
This paper contains significant portions of original text from the paper cited below. The original text was copied with insufficient attribution (including appropriate references to the original author(s) and/or paper title) and without permission.
Due to the nature of this violation, reasonable effort should be made to remove all past references to this paper, and future references should be made to the following article:
"Eliminating Noisy Information in Web Pages for Data Mining"
by L. Yi, B. Liu, X. Li
in the ACM Special Interest Group on Knowledge Discovery and Data Mining, pages: 1-10, August 2003
In this paper, we propose a noise elimination technique based on the following observation: In a given Web site, noisy blocks usually share some common contents and presentation styles, while the main content blocks of the pages are often diverse in their actual contents and/or presentation styles. Based on this observation, we propose a tree structure, called Style Tree, to capture the common presentation styles and the actual contents of the pages in a given Web site. By sampling the pages of the site, a Style Tree can be built for the site, which we call the Site Style Tree(SST). We then introduce an information based measure to determine which parts of the SST represent noises and which parts represent the main contents of the site. The SST is employed to detect and eliminate noises in any Web page of the site by mapping this page to the SST. Experimental results show that o- r noise elimination technique is able to improve the mining results significantly.
Keywords :
Web sites; data mining; tree data structures; Web page; Web site; data mining; information based measure; noise elimination; noisy blocks; page mapping; presentation style; site style tree; tree structure; Cleaning; Data mining; Layout; Noise; Noise measurement; Web pages; Data Mining; Site Style Tree(SST); Style Tree;