• DocumentCode
    3302919
  • Title

    Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches

  • Author

    Villavicencio, Aline ; de Medeiros Caseli, Helena ; Machado, Andre

  • fYear
    2009
  • fDate
    8-11 Sept. 2009
  • Firstpage
    27
  • Lastpage
    35
  • Abstract
    Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. The lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors; especially in technical domains where MWE are frequent. This paper investigates some approaches to the identification of MWEs in technical corpora based on: association measures, part-of-speech and lexical alignment information. We examine the influence of some factors on their performance such as sources of information for identification and evaluation. While the association measures emphasize recall, the alignment method focuses on precision.
  • Keywords
    Application software; Computer science; Global warming; Humans; Informatics; Information resources; Natural language processing; Natural languages; Performance loss; Vocabulary; Lexical Acquisition; Multiword Expressions; Natural Language Processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Human Language Technology (STIL), 2009 Seventh Brazilian Symposium in
  • Conference_Location
    Sao Carlos, TBD, Brazil
  • Print_ISBN
    978-1-4244-6008-3
  • Type

    conf

  • DOI
    10.1109/STIL.2009.33
  • Filename
    5532435