• DocumentCode
    3303119
  • Title

    Evaluation of Stopwords Removal on the Statistical Approach for Automatic Term Extraction

  • Author

    Braga, Ígor Assis

  • Author_Institution
    Inst. de Cienc. Mat. e de Comput. (ICMC), Univ. de Sao Paulo (USP), Sao Carlos, Brazil
  • fYear
    2009
  • fDate
    8-11 Sept. 2009
  • Firstpage
    142
  • Lastpage
    149
  • Abstract
    The construction of terminological products is important to the organization and spreading of knowledge. This task can be leveraged by the automatic extraction of terms, which has been considered a Natural Language Processing problem. In this paper, the interaction between the statistical approach to term extraction and the process of stopword removal is investigated. Experiments conducted on two corpora show that stopword removal improves performance when extracting bigram terms, no matter if the removal is done before or after the application of a statistical metric. As a result of this investigation, it is possible to recommend more appropriate statistical metrics for the case where it is possible to remove stopwords and for the case that this removal cannot be done.
  • Keywords
    Humans; Natural language processing; Ontologies;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Human Language Technology (STIL), 2009 Seventh Brazilian Symposium in
  • Conference_Location
    Sao Carlos, TBD, Brazil
  • Print_ISBN
    978-1-4244-6008-3
  • Type

    conf

  • DOI
    10.1109/STIL.2009.8
  • Filename
    5532448