• DocumentCode
    3301401
  • Title

    Exploiting external knowledge sources to improve kernel-based Word Sense Disambiguation

  • Author

    Jin, Peng ; Li, Fuxin ; Zhu, Danqing ; Wu, Yunfang ; Yu, Shiwen

  • Author_Institution
    Inst. of Comput. Linguistics, Peking Univ., Beijing
  • fYear
    2008
  • fDate
    19-22 Oct. 2008
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    This paper proposes a novel approach to improve the kernel-based word sense disambiguation (WSD). We first explain why linear kernels are more suitable to WSD and many other natural language processing problems than translation-invariant kernels. Based on the linear kernel, two external knowledge sources are integrated. One comprises a set of linguistic rules to find the crucial features. For the other, a distributional similarity thesaurus is used to alleviate data sparseness by generalizing crucial features when they do not match the word-form exactly. The experiments show that we have outperformed the state-of-the-art system on the benchmark data from English lexical sample task of SemEval-2007 and the improvement is statistically significant.
  • Keywords
    linguistics; natural language processing; support vector machines; thesauri; English lexical sample task; SemEval-2007; data sparseness; distributional similarity thesaurus; external knowledge sources; kernel-based word sense disambiguation; linear kernels; linguistic rules; natural language processing problems; support vector machine; Automation; Computational linguistics; Entropy; Kernel; Learning systems; Machine learning; Natural language processing; Support vector machines; Thesauri; Training data; kernel based method; support vector machine; word sense disambiguation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-4515-8
  • Electronic_ISBN
    978-1-4244-2780-2
  • Type

    conf

  • DOI
    10.1109/NLPKE.2008.4906810
  • Filename
    4906810