DocumentCode :
3288607
Title :
A Suffix Based Part-of-Speech Tagger for Turkish
Author :
Dincer, Taner ; Karaoglan, Bahar ; Kisla, Tarik
Author_Institution :
Mugla Univ., Mugla
fYear :
2008
fDate :
7-9 April 2008
Firstpage :
680
Lastpage :
685
Abstract :
In this paper, we present a stochastic part-of-speech tagger for Turkish. The tagger is primarily developed for information retrieval purposes, but it can as well serve as a light-weight PoS tagger for other purposes. The tagger uses a well-established Hidden Markov model of the language with a closed lexicon that consists of fixed number of letters from the word endings. We have considered seven different lengths of word endings against 30 training corpus sizes. Best- case accuracy obtained is 90.2% with 5 characters. The main contribution of this paper is to present a way of constructing a closed vocabulary for part-of-speech tagging effort that can be useful for highly inflected languages like Turkish, Finnish, Hungarian, Estonian, and Czech.
Keywords :
hidden Markov models; information retrieval; natural languages; vocabulary; Turkish language; hidden Markov model; information retrieval; suffix based stochastic part-of-speech tagger; vocabulary; Hidden Markov models; Indexing; Information retrieval; Information technology; Natural languages; Speech; Statistics; Stochastic processes; Tagging; Vocabulary; Agglutinative languages; Closed vocabulary; Information Retrieval.; Part-Of-Speech Tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
0-7695-3099-0
Type :
conf
DOI :
10.1109/ITNG.2008.103
Filename :
4492560
Link To Document :
بازگشت