• DocumentCode
    2668796
  • Title

    Semi-supervised word sense disambiguation based on weakly controlled sense induction

  • Author

    Broda, Bartosz ; Piasecki, Maciej

  • Author_Institution
    Inst. of Inf., Wroclaw Univ. of Technol., Wroclaw, Poland
  • fYear
    2009
  • fDate
    12-14 Oct. 2009
  • Firstpage
    17
  • Lastpage
    24
  • Abstract
    Word Sense Disambiguation in text is still a difficult problem as the best supervised methods require laborious and costly manual preparation of training data. On the other hand, the unsupervised methods express significantly lower accuracy and produce results that are not satisfying for many application. The goal of this work is to develop a model of Word Sense Disambiguation which minimises the amount of the required human intervention, but still assigns senses that come from a manually created lexical semantics resource, i.e., a wordnet. The proposed method is based on clustering text snippets including words in focus. Next, for each cluster we found a core, the core is labelled with a word sense by a human and finally is used to produce a classifier. Classifiers, constructed for each word separately, are applied to text. A performed comparison showed that the approach is close in its precision to a fully supervised one tested on the same data for Polish, and is much better than a baseline of the most frequent sense selection. Possible ways for overcoming the limited coverage of the approach are also discussed in the paper.
  • Keywords
    learning (artificial intelligence); natural language processing; text analysis; Polish language; lexical semantics resource; supervised methods; text snippets clustering; weakly controlled sense induction; word sense disambiguation; wordnet; Computer science; Humans; Informatics; Information retrieval; Information technology; Manuals; Natural languages; Performance evaluation; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Technology, 2009. IMCSIT '09. International Multiconference on
  • Conference_Location
    Mragowo
  • Print_ISBN
    978-1-4244-5314-6
  • Type

    conf

  • DOI
    10.1109/IMCSIT.2009.5352744
  • Filename
    5352744