Title :
Statistical uyhur POS tagging with TAG predictor for unknown words
Author :
Tian, Shengwei ; Ibrahim, Turgun ; Umal, Hasan ; Yu, Long
Author_Institution :
Coll. of Inf. Sci. & Eng., Xinjiang Univ., Urumqi, China
Abstract :
Automatic text tagging is an important component in higher level analysis of text corpora, and its output can be used in many natural language processing applications. Trigrams tags is an efficient statistical part-of-speech tagging. This paper describes a POS tagging for Uyhur text based on hidden Markov model using trigrams tags. We describe the basic model of Trigrams Tags, the techniques used for smoothing to address the sparse data problem and a tag predictor for unknown words. A comparison has shown that our approach performs significantly for the Uyhur tested corpora.
Keywords :
hidden Markov models; identification technology; speech processing; text analysis; TAG predictor; automatic text tagging; hidden Markov model; natural language processing applications; sparse data problem; statistical Uyhur POS tagging; statistical part-of-speech tagging; text corpora; trigrams tags; Computer networks; Context modeling; Hidden Markov models; Natural language processing; Predictive models; Probability; Smoothing methods; Support vector machines; Tagging; Technical Activities Guide -TAG; Part of Speech; Tag Predictor; Trigrams tags; Unknown Words;
Conference_Titel :
Computing, Communication, Control, and Management, 2009. CCCM 2009. ISECS International Colloquium on
Conference_Location :
Sanya
Print_ISBN :
978-1-4244-4247-8
DOI :
10.1109/CCCM.2009.5267823