A Chinese unsupervised word sense disambiguation method based on semantic vector

Author

Lei Cui ; Xinfu Li ; Danqing Wang

Author_Institution

Coll. of Math. & Comput. Sci., Hebei Univ., Baoding, China

fYear

2012

fDate

21-23 April 2012

Firstpage

3009

Lastpage

3012

Abstract

The supervise machine learning word sense disambiguation method need to annotate the words of the training corpus, in order to overcome the data sparseness problem to achieve the good word sense disambiguation effect we must establish a large-scale marked Corpus, but obtaining the marked corpus requires high artificial price. Against this problem this paper proposes an unsupervised learning method without manual annotation. Firstly we mine the feature words based on PMI (Point-wise Mutual Information) and Z test, defining the v words to describe a certain sense of polysemy, and then calculating the similarity between sense words and the features of polysemy in the context to determine the correct sense of the polysemy. This paper disambiguates ten typical polysemy, and experimental results prove that the method is effective.

Keywords

data mining; natural language processing; programming language semantics; unsupervised learning; word processing; Chinese unsupervised word sense disambiguation; PMI; Z test; data sparseness problem; feature word mining; marked corpus; point wise mutual information; polysemy; semantic vector; similarity calculation; supervise machine learning; training corpus; unsupervised learning method; v word; word annotation; Clustering algorithms; Context; Dictionaries; Educational institutions; Learning systems; Semantics; Vectors; PMI; semantic vector; similarity; unsupervised learning; word sense disambiguation;

fLanguage

English

Publisher

ieee

Conference_Titel

Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on

Conference_Location

Yichang

Print_ISBN

978-1-4577-1414-6

Type

conf

DOI

10.1109/CECNet.2012.6201527

Filename

6201527