DocumentCode
1652383
Title
Maximum Entropy combined FSM stemming method for Uyghur
Author
Wumaier, Aishan ; Kadeer, Zaokere ; Tursun, Parida ; Tian, Shengwei
Author_Institution
Sch. of Inf. Sci. & Eng., Xinjiang Univ., Urumqi, China
fYear
2009
Firstpage
51
Lastpage
55
Abstract
This paper presents the generation of Uyghur noun suffix DFA combined with maximum entropy (MaxEnt) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the MaxEnt model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the MaxEnt suffix identifying model and combination of MaxEnt with FSM.
Keywords
deterministic automata; finite state machines; maximum entropy methods; natural language processing; FSM stemming method; MaxEnt model; Uyghur language processing; Uyghur noun inflectional suffix DFA; agglutinative nature; finite state machine; maximum entropy; morphotactic rule; Algorithm design and analysis; Automata; Buildings; Doped fiber amplifiers; Entropy; Information science; Morphology; Natural languages; Software libraries; Statistical analysis; FSM; Maximum Entropy; Uyghur; stemming;
fLanguage
English
Publisher
ieee
Conference_Titel
Speech Database and Assessments, 2009 Oriental COCOSDA International Conference on
Conference_Location
Urumqi
Print_ISBN
978-1-4244-4400-7
Electronic_ISBN
978-1-4244-4400-7
Type
conf
DOI
10.1109/ICSDA.2009.5278378
Filename
5278378
Link To Document