• DocumentCode
    3317262
  • Title

    Improved estimation for unsupervised part-of-speech tagging

  • Author

    Wang, Qin Iris ; Schuurmans, Dale

  • Author_Institution
    Dept. of Comput. Sci., Alberta Univ., Edmonton, Alta., Canada
  • fYear
    2005
  • fDate
    30 Oct.-1 Nov. 2005
  • Firstpage
    219
  • Lastpage
    224
  • Abstract
    We demonstrate that a simple hidden Markov model can achieve state of the art performance in unsupervised part-of-speech tagging, by improving aspects of standard Baum-Welch (EM) estimation. One improvement uses word similarities to smooth the lexical tag → word probability estimates, which avoids over-fitting the lexical model. Another improvement constrains the model to preserve a specified marginal distribution over the hidden tags, which avoids over-fitting the tag → tag transition model. Although using more contextual information than an HMM remains desirable, improving basic estimation still leads to significant improvements and remains a prerequisite for training more complex models.
  • Keywords
    hidden Markov models; natural languages; unsupervised learning; word processing; hidden Markov model; lexical model; lexical tag; standard Baum-Welch estimation; tag transition model; unsupervised part-of-speech tagging; word probability estimate; word similarity; Buildings; Context modeling; Entropy; Hidden Markov models; Iris; Parameter estimation; State estimation; Tagging; Training data; Unsupervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9361-9
  • Type

    conf

  • DOI
    10.1109/NLPKE.2005.1598738
  • Filename
    1598738