DocumentCode
3632672
Title
Error analysis in Croatian morphosyntactic tagging
Author
Zeljko Agic;Marko Tadic;Zdravko Dovedan
Author_Institution
Department of Information Sciences, Faculty of Humanities and Social Sciences, University of Zagreb, Ivana Lu?i?a 3, HR-10000, Croatia
fYear
2009
fDate
6/1/2009 12:00:00 AM
Firstpage
521
Lastpage
526
Abstract
In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 85 percent of all tagging errors occur on nouns, adjectives, pronouns and verbs. Moreover, approximately 50 percent of these are shown to be incorrect assignments of case values. We provide various other distributional properties of errors in assigning morphosyntactic descriptions for these and other parts of speech. On the basis of these properties, we propose rule-based and stochastic strategies which could be integrated in the tagging module, creating a hybrid procedure in order to raise overall tagging accuracy for Croatian.
Keywords
"Error analysis","Tagging","Stochastic processes","Hidden Markov models","Natural languages","Stochastic systems","Speech","Smoothing methods","Natural language processing","Humans"
Publisher
ieee
Conference_Titel
Information Technology Interfaces, 2009. ITI ´09. Proceedings of the ITI 2009 31st International Conference on
ISSN
1330-1012
Print_ISBN
978-953-7138-15-8
Type
conf
DOI
10.1109/ITI.2009.5196140
Filename
5196140
Link To Document