Title :
Word sense disambiguation method with topic feature
Author :
Yun Zhou ; Ting Wang ; Zhiyuan Wang ; Lupeng Zhang
Author_Institution :
Comput. Sch., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Word sense disambiguation (WSD) is usually confined in a sentence, which results in short text. Moreover, the deficiency of sense-labelled corpus incurs serious data sparsity. Short text and data sparsity hinder the performance improvement of WSD. As an unsupervised learning method, topic model tries to cluster and compress semantic information in the text to improve the generalization of words. This paper proposes a WSD method integrating topic feature which enhances the classifier by LDA (Latent Dirichlet Allocation) topic feature inferred from background corpus, and evaluates the method on all-words WSD task of Senseval-3. Only with a part of SemCor as labelled training dataset, the F1 value of the proposed method is 0.680, which is better than that of best system in Senseval-3 0.652 and that of best result in the literature 0.670 as we are informed. Experimental results also show that appropriate number of topics benefits WSD; the consistence between background corpus and evaluation dataset is the key to improve WSD; larger balanced background corpus brings greater performance increase to WSD system.
Keywords :
natural language processing; semantic networks; text analysis; unsupervised learning; LDA topic feature; SemCor; Senseval-3; WSD method; background corpus; data sparsity; evaluation dataset; labelled training dataset; latent Dirichlet allocation topic feature; performance improvement; semantic information; sense-labelled corpus; short text; unsupervised learning method; word sense disambiguation method; LDA; background raw corpus; topic feature; word sense disambiguation;
Conference_Titel :
Information Science and Control Engineering 2012 (ICISCE 2012), IET International Conference on
Conference_Location :
Shenzhen
Electronic_ISBN :
978-1-84919-641-3
DOI :
10.1049/cp.2012.2305