Title :
A Letter Tagging Approach to Uyghur Tokenization
Abstract :
In this paper, we present a letter tagging approach(LTA) to Uyghur tokenization. Experiments show that the problem with label bias (rich and complex suffixes) problem to be resolved using LTA combined with CRFs, so it is more effective than previous work, the accuracy of word tokenization reaches 93.3%. In future our tokenization research will be very useful to other Altaic languages information processing.
Keywords :
identification technology; natural language processing; altaic language information processing; label bias; letter tagging; uyghur tokenization; word tokenization; Accuracy; Hidden Markov models; Information processing; Labeling; Natural language processing; Tagging; Training; Letter tagging approach; Morpheme analysis (MA); Tokenization;
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
DOI :
10.1109/IALP.2010.72