DocumentCode
3317262
Title
Improved estimation for unsupervised part-of-speech tagging
Author
Wang, Qin Iris ; Schuurmans, Dale
Author_Institution
Dept. of Comput. Sci., Alberta Univ., Edmonton, Alta., Canada
fYear
2005
fDate
30 Oct.-1 Nov. 2005
Firstpage
219
Lastpage
224
Abstract
We demonstrate that a simple hidden Markov model can achieve state of the art performance in unsupervised part-of-speech tagging, by improving aspects of standard Baum-Welch (EM) estimation. One improvement uses word similarities to smooth the lexical tag → word probability estimates, which avoids over-fitting the lexical model. Another improvement constrains the model to preserve a specified marginal distribution over the hidden tags, which avoids over-fitting the tag → tag transition model. Although using more contextual information than an HMM remains desirable, improving basic estimation still leads to significant improvements and remains a prerequisite for training more complex models.
Keywords
hidden Markov models; natural languages; unsupervised learning; word processing; hidden Markov model; lexical model; lexical tag; standard Baum-Welch estimation; tag transition model; unsupervised part-of-speech tagging; word probability estimate; word similarity; Buildings; Context modeling; Entropy; Hidden Markov models; Iris; Parameter estimation; State estimation; Tagging; Training data; Unsupervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN
0-7803-9361-9
Type
conf
DOI
10.1109/NLPKE.2005.1598738
Filename
1598738
Link To Document