Title :
Keyphrase extraction based on semantic relatedness
Author :
Xie, Fei ; Wu, Xindong ; Hu, Xuegang
Author_Institution :
Dept. of Comput. Sci., Hefei Univ. of Technol., Hefei, China
Abstract :
Keyphrase extraction is a fundamental research task in natural language processing and text mining. A limitation of previous keyphrase extraction methods based on semantic analysis is that the acquisition of the semantic features within phrases is restricted by the constructed thesaurus and language. An approach to the acquisition of the semantic features within phrases from a single document is proposed in this paper, which is used to extract document keyphrases. Semantic relatedness degrees between phrases are computed using word co-occurrence information in the document, and the document is represented as a relatedness graph. Keyphrases are extracted based on the semantic relatedness features acquired from the graph. Our experiments demonstrate that the proposed keyphrase extraction method always outperforms the baseline methods TFIDF and Kea. Furthermore, our approach is not domain-specific and the method generalizes well when it is trained on one domain (journal articles) and tested on another (news web pages).
Keywords :
data mining; natural language processing; text analysis; word processing; keyphrase extraction methods; natural language processing; semantic analysis; semantic features acquisition; semantic relatedness; text mining; Data mining; Feature extraction; Probability; Semantics; Thesauri; Web pages; keyphrase extraction; semantic relatedness; word co-occurrence;
Conference_Titel :
Cognitive Informatics (ICCI), 2010 9th IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-8041-8
DOI :
10.1109/COGINF.2010.5599721