DocumentCode
1607294
Title
Learning topic knowledge to improve Chinese word sense disambiguation
Author
Wang, Huizhen ; Zhu, Jingbo
Author_Institution
Natural Language Process. Lab., Northeastern Univ., Shenyang, China
fYear
2010
Firstpage
175
Lastpage
180
Abstract
This paper addresses an issue of incorporating topic knowledge to improve Chinese word sense disambiguation. The key is how to learn topic knowledge as features in the design of classifiers for disambiguating word senses. This paper presents two solutions to learn topic knowledge. In the first solution, a Chinese domain knowledge dictionary named NEUKD is used to generate domain feature set. However, due to the limited coverage of the NEUKD, a constrained clustering algorithm is adopted for dictionary expansion. The second method is to build topic feature set by utilizing the Latent Dirichlet Allocation (LDA) algorithm on a large scale unlabeled corpus. Experiments on the SENSEVAL-3 Chinese dataset demonstrated that integrating topic knowledge improve the performance of Chinese word sense disambiguation.
Keywords
dictionaries; learning (artificial intelligence); natural language processing; pattern classification; pattern clustering; Chinese domain knowledge dictionary; Chinese word sense disambiguation; NEUKD; SENSEVAL-3 Chinese dataset; classifier design; constrained clustering algorithm; domain feature set generation; latent dirichlet allocation algorithm; topic knowledge learning; Classification algorithms; Clustering algorithms; Context; Context modeling; Data models; Dictionaries; Training; Chinese word sense disambiguation; classification model; topic knowledge;
fLanguage
English
Publisher
ieee
Conference_Titel
Universal Communication Symposium (IUCS), 2010 4th International
Conference_Location
Beijing
Print_ISBN
978-1-4244-7821-7
Type
conf
DOI
10.1109/IUCS.2010.5666232
Filename
5666232
Link To Document