• DocumentCode
    389242
  • Title

    Chinese POS tagging based on maximum entropy model

  • Author

    Zhao, Jian ; Wang, Xiao-long

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., China
  • Volume
    2
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    601
  • Abstract
    The POS (part of speech) tagging is the basic work in natural language processing. The tagging precision will have an important effect on the result of latter process, such as syntax analysis. In this paper, a Chinese POS tagger based on the maximum entropy model is presented, which trains from a large corpus annotated with Chinese POS tags and assigns the best tag sequence to the Chinese sentence to be annotated. In this model, all the features that are useful to predicate the POS tags are mined to make the model closer to the real case. In addition, for the problem of overfitting, a smoothing method and a POS dictionary are maintained to reduce the model´s dependence to training data and improve the efficiency of the search process. Open test results shows that the Chinese POS tagging with this method can achieve an accuracy of 96.8%.
  • Keywords
    feature extraction; grammars; maximum entropy methods; natural languages; smoothing methods; Chinese language; Chinese sentence; dictionary; features selection; maximum entropy model; natural language processing; part of speech tagging; smoothing model; tag sequence; Computer science; Data mining; Entropy; Hidden Markov models; Natural language processing; Probability distribution; Smoothing methods; Speech; Tagging; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
  • Print_ISBN
    0-7803-7508-4
  • Type

    conf

  • DOI
    10.1109/ICMLC.2002.1174406
  • Filename
    1174406