DocumentCode :
3303119
Title :
Evaluation of Stopwords Removal on the Statistical Approach for Automatic Term Extraction
Author :
Braga, Ígor Assis
Author_Institution :
Inst. de Cienc. Mat. e de Comput. (ICMC), Univ. de Sao Paulo (USP), Sao Carlos, Brazil
fYear :
2009
fDate :
8-11 Sept. 2009
Firstpage :
142
Lastpage :
149
Abstract :
The construction of terminological products is important to the organization and spreading of knowledge. This task can be leveraged by the automatic extraction of terms, which has been considered a Natural Language Processing problem. In this paper, the interaction between the statistical approach to term extraction and the process of stopword removal is investigated. Experiments conducted on two corpora show that stopword removal improves performance when extracting bigram terms, no matter if the removal is done before or after the application of a statistical metric. As a result of this investigation, it is possible to recommend more appropriate statistical metrics for the case where it is possible to remove stopwords and for the case that this removal cannot be done.
Keywords :
Humans; Natural language processing; Ontologies;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Human Language Technology (STIL), 2009 Seventh Brazilian Symposium in
Conference_Location :
Sao Carlos, TBD, Brazil
Print_ISBN :
978-1-4244-6008-3
Type :
conf
DOI :
10.1109/STIL.2009.8
Filename :
5532448
Link To Document :
بازگشت