Title :
Keyword extraction of web pages based on domain thesaurus
Author :
Guowan He ; Jie Wang ; Yafeng Zhang ; Yan Peng
Author_Institution :
Sch. of Manage., Capital Normal Univ., Beijing, China
Abstract :
This paper presents a keyword extraction method of web pages based on domain thesaurus. The method extracts keywords from web pages based on traditional statistic features, such as frequency and location, and it also evaluates the weight of candidate keywords combining with their relation of domain thesaurus. This method can effectively identify domain keywords of web pages with low frequency but more information in specific area. Based on the web pages keywords extraction of environment domain as an example, this paper introduces the framework and algorithm of the method. Experimental results show that, compared with the traditional TF-IDF method, this method has a better keyword extraction performance in environment-related web pages, an average of 20% recall rate, and an average of 15 percent accuracy rate.
Keywords :
Internet; statistical analysis; Internet; Web pages; domain thesaurus; keyword extraction method; Accuracy; Feature extraction; Support vector machines; Thesauri; Domain thesaurus; Keyword extraction; Keyword of web pages; Keyword weight;
Conference_Titel :
Cloud Computing and Intelligence Systems (CCIS), 2014 IEEE 3rd International Conference on
Print_ISBN :
978-1-4799-4720-1
DOI :
10.1109/CCIS.2014.7175749