• DocumentCode
    1607294
  • Title

    Learning topic knowledge to improve Chinese word sense disambiguation

  • Author

    Wang, Huizhen ; Zhu, Jingbo

  • Author_Institution
    Natural Language Process. Lab., Northeastern Univ., Shenyang, China
  • fYear
    2010
  • Firstpage
    175
  • Lastpage
    180
  • Abstract
    This paper addresses an issue of incorporating topic knowledge to improve Chinese word sense disambiguation. The key is how to learn topic knowledge as features in the design of classifiers for disambiguating word senses. This paper presents two solutions to learn topic knowledge. In the first solution, a Chinese domain knowledge dictionary named NEUKD is used to generate domain feature set. However, due to the limited coverage of the NEUKD, a constrained clustering algorithm is adopted for dictionary expansion. The second method is to build topic feature set by utilizing the Latent Dirichlet Allocation (LDA) algorithm on a large scale unlabeled corpus. Experiments on the SENSEVAL-3 Chinese dataset demonstrated that integrating topic knowledge improve the performance of Chinese word sense disambiguation.
  • Keywords
    dictionaries; learning (artificial intelligence); natural language processing; pattern classification; pattern clustering; Chinese domain knowledge dictionary; Chinese word sense disambiguation; NEUKD; SENSEVAL-3 Chinese dataset; classifier design; constrained clustering algorithm; domain feature set generation; latent dirichlet allocation algorithm; topic knowledge learning; Classification algorithms; Clustering algorithms; Context; Context modeling; Data models; Dictionaries; Training; Chinese word sense disambiguation; classification model; topic knowledge;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Universal Communication Symposium (IUCS), 2010 4th International
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-7821-7
  • Type

    conf

  • DOI
    10.1109/IUCS.2010.5666232
  • Filename
    5666232