DocumentCode :
1954287
Title :
A Letter Tagging Approach to Uyghur Tokenization
Author :
Aisha, Batuer
fYear :
2010
fDate :
28-30 Dec. 2010
Firstpage :
11
Lastpage :
14
Abstract :
In this paper, we present a letter tagging approach(LTA) to Uyghur tokenization. Experiments show that the problem with label bias (rich and complex suffixes) problem to be resolved using LTA combined with CRFs, so it is more effective than previous work, the accuracy of word tokenization reaches 93.3%. In future our tokenization research will be very useful to other Altaic languages information processing.
Keywords :
identification technology; natural language processing; altaic language information processing; label bias; letter tagging; uyghur tokenization; word tokenization; Accuracy; Hidden Markov models; Information processing; Labeling; Natural language processing; Tagging; Training; Letter tagging approach; Morpheme analysis (MA); Tokenization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
Type :
conf
DOI :
10.1109/IALP.2010.72
Filename :
5681556
Link To Document :
بازگشت