DocumentCode
2262714
Title
Improving Keyphrase Extraction Using Wikipedia Semantics
Author
Shi, Tianyi ; Jiao, Shidou ; Hou, Junqi ; Li, Minglu
Author_Institution
Dept. of Comput. Sci., Shanghai Jiao Tong Univ., Shanghai
Volume
2
fYear
2008
fDate
20-22 Dec. 2008
Firstpage
42
Lastpage
46
Abstract
Keyphrase extraction plays a key role in various fields such as information retrieval, text classification etc. However, most traditional keyphrase extraction methods relies on word frequency and position instead of document inherent semantic information, often results in inaccurate output. In this paper, we propose a novel automatic keyphrase extraction algorithm using semantic features mined from online Wikipedia. This algorithm first identifies candidate keyphrases based on lexical methods, and then a semantic graph which connects candidate keyphrases with document topics is constructed. Afterwards, a link analysis algorithm is applied to assign semantic feature weight to the candidate keyphrases. Finally, several statistical and semantic features are assembled by a regression model to predict the quality of candidates. Encouraging results are achieved in our experiments which show the effectiveness of our method.
Keywords
Web sites; graph theory; information retrieval; pattern classification; text analysis; Wikipedia semantics; information retrieval; keyphrase extraction; lexical methods; semantic graph; text classification; Algorithm design and analysis; Application software; Data mining; Information retrieval; Information technology; Search engines; Taxonomy; Text categorization; Thesauri; Wikipedia; keyphrase extraction; wikipedia;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on
Conference_Location
Shanghai
Print_ISBN
978-0-7695-3497-8
Type
conf
DOI
10.1109/IITA.2008.211
Filename
4739723
Link To Document