• DocumentCode
    3599861
  • Title

    Keyword extraction of web pages based on domain thesaurus

  • Author

    Guowan He ; Jie Wang ; Yafeng Zhang ; Yan Peng

  • Author_Institution
    Sch. of Manage., Capital Normal Univ., Beijing, China
  • fYear
    2014
  • Firstpage
    310
  • Lastpage
    314
  • Abstract
    This paper presents a keyword extraction method of web pages based on domain thesaurus. The method extracts keywords from web pages based on traditional statistic features, such as frequency and location, and it also evaluates the weight of candidate keywords combining with their relation of domain thesaurus. This method can effectively identify domain keywords of web pages with low frequency but more information in specific area. Based on the web pages keywords extraction of environment domain as an example, this paper introduces the framework and algorithm of the method. Experimental results show that, compared with the traditional TF-IDF method, this method has a better keyword extraction performance in environment-related web pages, an average of 20% recall rate, and an average of 15 percent accuracy rate.
  • Keywords
    Internet; statistical analysis; Internet; Web pages; domain thesaurus; keyword extraction method; Accuracy; Feature extraction; Support vector machines; Thesauri; Domain thesaurus; Keyword extraction; Keyword of web pages; Keyword weight;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing and Intelligence Systems (CCIS), 2014 IEEE 3rd International Conference on
  • Print_ISBN
    978-1-4799-4720-1
  • Type

    conf

  • DOI
    10.1109/CCIS.2014.7175749
  • Filename
    7175749