DocumentCode :
3268923
Title :
Stop-words in keyphrase extraction problem
Author :
Popova, Svetlana ; Kovriguina, Liubov ; Mouromtsev, Dmitry ; Khodyrev, I.
Author_Institution :
St. Petersburg Nat. Res. Univ. of Inf. Technol., Mech. & Opt., St. Petersburg, Russia
fYear :
2013
fDate :
11-15 Nov. 2013
Firstpage :
113
Lastpage :
121
Abstract :
Keyword extraction problem is one of the most significant tasks in information retrieval. High-quality keyword extraction sufficiently influences the progress in the following subtasks of information retrieval: classification and clustering, data mining, knowledge extraction and representation, etc. The research environment has specified a layout for keyphrase extraction. However, some of the possible decisions remain uninvolved in the paradigm. In the paper the authors observe the scope of interdisciplinary methods applicable to automatic stop list feeding. The chosen method belongs to the class of experiential models. The research procedure based on this method allows to improve the quality of keyphrase extraction on the stage of candidate keyphrase building. Several ways to automatic feeding of the stop lists are proposed in the paper as well. One of them is based on provisions of lexical statistics and the results of its application to the discussed task point out the non-gaussian nature of text corpora. The second way based on usage of the Inspec train collection to the feeding of stop lists improves the quality considerably.
Keywords :
information retrieval; natural language processing; statistical analysis; text analysis; Inspec train collection; automatic stop list feeding; candidate keyphrase building; classification; clustering; data mining; information retrieval; interdisciplinary methods; keyphrase extraction problem; keyword extraction; knowledge extraction; knowledge representation; lexical statistics; natural language processing; nonGaussian nature; quality improvement; stop-words; text corpora; Abstracts; Buildings; Clustering algorithms; Data mining; Dictionaries; Frequency control; Training; informational retrieval; keyphrase extraction; keyphrase identification; natural language processing; stop words extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Open Innovations Association (FRUCT), 2013 14th Conference of
Conference_Location :
Espoo
ISSN :
2305-7254
Type :
conf
DOI :
10.1109/FRUCT.2013.6737953
Filename :
6737953
Link To Document :
بازگشت