Title :
A Suffix Based Part-of-Speech Tagger for Turkish
Author :
Dincer, Taner ; Karaoglan, Bahar ; Kisla, Tarik
Author_Institution :
Mugla Univ., Mugla
Abstract :
In this paper, we present a stochastic part-of-speech tagger for Turkish. The tagger is primarily developed for information retrieval purposes, but it can as well serve as a light-weight PoS tagger for other purposes. The tagger uses a well-established Hidden Markov model of the language with a closed lexicon that consists of fixed number of letters from the word endings. We have considered seven different lengths of word endings against 30 training corpus sizes. Best- case accuracy obtained is 90.2% with 5 characters. The main contribution of this paper is to present a way of constructing a closed vocabulary for part-of-speech tagging effort that can be useful for highly inflected languages like Turkish, Finnish, Hungarian, Estonian, and Czech.
Keywords :
hidden Markov models; information retrieval; natural languages; vocabulary; Turkish language; hidden Markov model; information retrieval; suffix based stochastic part-of-speech tagger; vocabulary; Hidden Markov models; Indexing; Information retrieval; Information technology; Natural languages; Speech; Statistics; Stochastic processes; Tagging; Vocabulary; Agglutinative languages; Closed vocabulary; Information Retrieval.; Part-Of-Speech Tagging;
Conference_Titel :
Information Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
0-7695-3099-0
DOI :
10.1109/ITNG.2008.103