DocumentCode :
589898
Title :
Improving navigation page detection by using DOM-based block text identification
Author :
Li Yue ; Dong Shou-bin ; Zheng Xiang ; Ma Bin-Hua
Author_Institution :
Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
fYear :
2012
fDate :
21-23 Nov. 2012
Firstpage :
129
Lastpage :
134
Abstract :
Internet changes very fast, it is necessary to classify the web pages for different usages. According to user purpose, web pages can be classified into navigation pages and content pages. To detect navigation pages is useful for web crawling, topical detection, etc. In this paper, we use DOM-Based block text identification method to improve navigation pages detection. Experimental results suggest that, compared to prior methods, our method is more effective.
Keywords :
Internet; pattern classification; text analysis; DOM-based block text identification; Internet; Web crawling; Web page classification; content page; navigation page detection; topical detection; Abstracts; Bars; Business; HTML; Navigation; Noise; Web pages; DOM; block text identification; navigation pages; web pages classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
ICT and Knowledge Engineering (ICT & Knowledge Engineering), 2012 10th International Conference on
Conference_Location :
Bangkok
ISSN :
2157-0981
Print_ISBN :
978-1-4673-2316-1
Type :
conf
DOI :
10.1109/ICTKE.2012.6408541
Filename :
6408541
Link To Document :
بازگشت