Title :
On certain aspects of Kazakh part-of-speech tagging
Author :
Makazhanov, Aibek ; Yessenbayev, Zhandos ; Sabyrgaliyev, Islam ; Sharafudinov, Anuar ; Makhambetov, Olzhas
Author_Institution :
Nazarbayev Univ. Res. & Innovation Syst., Astana, Kazakhstan
Abstract :
We compare and discuss various approaches to the problem of part of speech (POS) tagging of texts written in Kazakh, an agglutinative and highly inflectional Turkic language. In Kazakh a single root may produce hundreds of word forms, and it is difficult, if at all possible, to label enough training data to account for a vast set of all possible word forms in the language. Thus, current state of the art statistical POS taggers may not be as effective for Kazakh as for morphologically less complex languages, e.g. English. Also the choice of a POS tag set may influence the informativeness and the accuracy of tagging.
Keywords :
natural language processing; text analysis; word processing; Kazakh part-of-speech tagging; Turkic language; text POS tagging; word form; Accuracy; Computational linguistics; Natural language processing; Speech; Tagging; Training; Training data;
Conference_Titel :
Application of Information and Communication Technologies (AICT), 2014 IEEE 8th International Conference on
Conference_Location :
Astana
Print_ISBN :
978-1-4799-4120-9
DOI :
10.1109/ICAICT.2014.7035953