DocumentCode
3599861
Title
Keyword extraction of web pages based on domain thesaurus
Author
Guowan He ; Jie Wang ; Yafeng Zhang ; Yan Peng
Author_Institution
Sch. of Manage., Capital Normal Univ., Beijing, China
fYear
2014
Firstpage
310
Lastpage
314
Abstract
This paper presents a keyword extraction method of web pages based on domain thesaurus. The method extracts keywords from web pages based on traditional statistic features, such as frequency and location, and it also evaluates the weight of candidate keywords combining with their relation of domain thesaurus. This method can effectively identify domain keywords of web pages with low frequency but more information in specific area. Based on the web pages keywords extraction of environment domain as an example, this paper introduces the framework and algorithm of the method. Experimental results show that, compared with the traditional TF-IDF method, this method has a better keyword extraction performance in environment-related web pages, an average of 20% recall rate, and an average of 15 percent accuracy rate.
Keywords
Internet; statistical analysis; Internet; Web pages; domain thesaurus; keyword extraction method; Accuracy; Feature extraction; Support vector machines; Thesauri; Domain thesaurus; Keyword extraction; Keyword of web pages; Keyword weight;
fLanguage
English
Publisher
ieee
Conference_Titel
Cloud Computing and Intelligence Systems (CCIS), 2014 IEEE 3rd International Conference on
Print_ISBN
978-1-4799-4720-1
Type
conf
DOI
10.1109/CCIS.2014.7175749
Filename
7175749
Link To Document