DocumentCode
3303119
Title
Evaluation of Stopwords Removal on the Statistical Approach for Automatic Term Extraction
Author
Braga, Ígor Assis
Author_Institution
Inst. de Cienc. Mat. e de Comput. (ICMC), Univ. de Sao Paulo (USP), Sao Carlos, Brazil
fYear
2009
fDate
8-11 Sept. 2009
Firstpage
142
Lastpage
149
Abstract
The construction of terminological products is important to the organization and spreading of knowledge. This task can be leveraged by the automatic extraction of terms, which has been considered a Natural Language Processing problem. In this paper, the interaction between the statistical approach to term extraction and the process of stopword removal is investigated. Experiments conducted on two corpora show that stopword removal improves performance when extracting bigram terms, no matter if the removal is done before or after the application of a statistical metric. As a result of this investigation, it is possible to recommend more appropriate statistical metrics for the case where it is possible to remove stopwords and for the case that this removal cannot be done.
Keywords
Humans; Natural language processing; Ontologies;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Human Language Technology (STIL), 2009 Seventh Brazilian Symposium in
Conference_Location
Sao Carlos, TBD, Brazil
Print_ISBN
978-1-4244-6008-3
Type
conf
DOI
10.1109/STIL.2009.8
Filename
5532448
Link To Document