DocumentCode
1696044
Title
Removing non-informative blocks from the web pages
Author
Gunasundari, R. ; Karthikeyan, S.
Author_Institution
Karpagam Univ., Coimbatore, India
fYear
2010
Firstpage
810
Lastpage
814
Abstract
With the enormous growth on the web, users get easily lost in the rich hyper structure. Thus developing user friendly and automated tools for providing relevant information without any redundant links to the users to cater to their needs is the primary task for the website owners. But user is interested only in the informative contents and not in non-informative content blocks. Web pages often contain navigation sidebars, advertisements, search blocks, copyright notices, etc which are not content blocks. The information contained in these noncontent blocks can harm web mining. So it is important to separate the informative primary content blocks from noninformative blocks. In this paper are proposed three different algorithms for removing non-content blocks from the web pages. Removal of non-informative content blocks from web pages can achieve significant storage and time saving.
Keywords
Web services; Web sites; content management; data mining; information retrieval; Web blocks; Web mining; Web pages; Website; informative contents; noisy blocks; non-informative content; Algorithm design and analysis; Data mining; Entropy; Feature extraction; HTML; Web pages; Web blocks; Web content mining; Web documents; noisy blocks;
fLanguage
English
Publisher
ieee
Conference_Titel
Communication Control and Computing Technologies (ICCCCT), 2010 IEEE International Conference on
Conference_Location
Ramanathapuram
Print_ISBN
978-1-4244-7769-2
Type
conf
DOI
10.1109/ICCCCT.2010.5670731
Filename
5670731
Link To Document