• DocumentCode
    2114086
  • Title

    A Chinese unsupervised word sense disambiguation method based on semantic vector

  • Author

    Lei Cui ; Xinfu Li ; Danqing Wang

  • Author_Institution
    Coll. of Math. & Comput. Sci., Hebei Univ., Baoding, China
  • fYear
    2012
  • fDate
    21-23 April 2012
  • Firstpage
    3009
  • Lastpage
    3012
  • Abstract
    The supervise machine learning word sense disambiguation method need to annotate the words of the training corpus, in order to overcome the data sparseness problem to achieve the good word sense disambiguation effect we must establish a large-scale marked Corpus, but obtaining the marked corpus requires high artificial price. Against this problem this paper proposes an unsupervised learning method without manual annotation. Firstly we mine the feature words based on PMI (Point-wise Mutual Information) and Z test, defining the v words to describe a certain sense of polysemy, and then calculating the similarity between sense words and the features of polysemy in the context to determine the correct sense of the polysemy. This paper disambiguates ten typical polysemy, and experimental results prove that the method is effective.
  • Keywords
    data mining; natural language processing; programming language semantics; unsupervised learning; word processing; Chinese unsupervised word sense disambiguation; PMI; Z test; data sparseness problem; feature word mining; marked corpus; point wise mutual information; polysemy; semantic vector; similarity calculation; supervise machine learning; training corpus; unsupervised learning method; v word; word annotation; Clustering algorithms; Context; Dictionaries; Educational institutions; Learning systems; Semantics; Vectors; PMI; semantic vector; similarity; unsupervised learning; word sense disambiguation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference on
  • Conference_Location
    Yichang
  • Print_ISBN
    978-1-4577-1414-6
  • Type

    conf

  • DOI
    10.1109/CECNet.2012.6201527
  • Filename
    6201527