DocumentCode :
1652383
Title :
Maximum Entropy combined FSM stemming method for Uyghur
Author :
Wumaier, Aishan ; Kadeer, Zaokere ; Tursun, Parida ; Tian, Shengwei
Author_Institution :
Sch. of Inf. Sci. & Eng., Xinjiang Univ., Urumqi, China
fYear :
2009
Firstpage :
51
Lastpage :
55
Abstract :
This paper presents the generation of Uyghur noun suffix DFA combined with maximum entropy (MaxEnt) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the MaxEnt model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the MaxEnt suffix identifying model and combination of MaxEnt with FSM.
Keywords :
deterministic automata; finite state machines; maximum entropy methods; natural language processing; FSM stemming method; MaxEnt model; Uyghur language processing; Uyghur noun inflectional suffix DFA; agglutinative nature; finite state machine; maximum entropy; morphotactic rule; Algorithm design and analysis; Automata; Buildings; Doped fiber amplifiers; Entropy; Information science; Morphology; Natural languages; Software libraries; Statistical analysis; FSM; Maximum Entropy; Uyghur; stemming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Speech Database and Assessments, 2009 Oriental COCOSDA International Conference on
Conference_Location :
Urumqi
Print_ISBN :
978-1-4244-4400-7
Electronic_ISBN :
978-1-4244-4400-7
Type :
conf
DOI :
10.1109/ICSDA.2009.5278378
Filename :
5278378
Link To Document :
بازگشت