• DocumentCode
    2734692
  • Title

    One Sense per N-gram

  • Author

    Pengyuan Liu ; Shui Liu ; Shiqi Li ; Shiwen Yu

  • Author_Institution
    Inst. of Comput. Linguistics, Peking Univ., Beijing, China
  • Volume
    3
  • fYear
    2010
  • fDate
    Aug. 31 2010-Sept. 3 2010
  • Firstpage
    195
  • Lastpage
    198
  • Abstract
    This paper presents a novel supposition, One Sense Per N-gram (N > 1), which we believe is appropriate for more linguistic phenomena and can serve as a general version instead of the celebrated One Sense Per Collocation supposition, at least in Chinese language. This new supposition is based on our observation of the error detection process of annoted sense in People´s Daily that are tagged by an automatic WSD system. Our preliminary experiment on Chinese Word Sense Tagging Corpus shows that it holds with over 85.9% agreement for both nouns and verbs. Based on the supposition we build a prototype naïve Bayes WSD system and tested on Multilingual Chinese-English Lexical Sample task (MCELS) in Semeval-2007. Experimental results show our prototype system can promote the performance of baseline system by 2.7%.
  • Keywords
    Bayes methods; natural language processing; Chinese language; Chinese word sense tagging corpus; People´s Daily; Semeval-2007; automatic word sense disambiguation system; error detection process; linguistic phenomena; multilingual Chinese-English lexical sample task; naïve Bayes word sense disambiguation system; one sense per N-gram; Conferences; Context; Entropy; Prototypes; Semantics; Tagging; Training; One sense per N-gram; language model; word sense disambiguation; word sense tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on
  • Conference_Location
    Toronto, ON
  • Print_ISBN
    978-1-4244-8482-9
  • Electronic_ISBN
    978-0-7695-4191-4
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2010.268
  • Filename
    5614213