DocumentCode
3268923
Title
Stop-words in keyphrase extraction problem
Author
Popova, Svetlana ; Kovriguina, Liubov ; Mouromtsev, Dmitry ; Khodyrev, I.
Author_Institution
St. Petersburg Nat. Res. Univ. of Inf. Technol., Mech. & Opt., St. Petersburg, Russia
fYear
2013
fDate
11-15 Nov. 2013
Firstpage
113
Lastpage
121
Abstract
Keyword extraction problem is one of the most significant tasks in information retrieval. High-quality keyword extraction sufficiently influences the progress in the following subtasks of information retrieval: classification and clustering, data mining, knowledge extraction and representation, etc. The research environment has specified a layout for keyphrase extraction. However, some of the possible decisions remain uninvolved in the paradigm. In the paper the authors observe the scope of interdisciplinary methods applicable to automatic stop list feeding. The chosen method belongs to the class of experiential models. The research procedure based on this method allows to improve the quality of keyphrase extraction on the stage of candidate keyphrase building. Several ways to automatic feeding of the stop lists are proposed in the paper as well. One of them is based on provisions of lexical statistics and the results of its application to the discussed task point out the non-gaussian nature of text corpora. The second way based on usage of the Inspec train collection to the feeding of stop lists improves the quality considerably.
Keywords
information retrieval; natural language processing; statistical analysis; text analysis; Inspec train collection; automatic stop list feeding; candidate keyphrase building; classification; clustering; data mining; information retrieval; interdisciplinary methods; keyphrase extraction problem; keyword extraction; knowledge extraction; knowledge representation; lexical statistics; natural language processing; nonGaussian nature; quality improvement; stop-words; text corpora; Abstracts; Buildings; Clustering algorithms; Data mining; Dictionaries; Frequency control; Training; informational retrieval; keyphrase extraction; keyphrase identification; natural language processing; stop words extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Open Innovations Association (FRUCT), 2013 14th Conference of
Conference_Location
Espoo
ISSN
2305-7254
Type
conf
DOI
10.1109/FRUCT.2013.6737953
Filename
6737953
Link To Document