DocumentCode
3315203
Title
Conditional Random Fields combined FSM stemming method for Uyghur
Author
Wumaier, Aishan ; Yibulayin, Tuergen ; Kadeer, Zaokere ; Tian, Shengwei
Author_Institution
Sch. of Inf. Sci. & Eng., Xinjiang Univ., Urumqi, China
fYear
2009
fDate
8-11 Aug. 2009
Firstpage
295
Lastpage
299
Abstract
This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM.
Keywords
deterministic automata; finite state machines; natural language processing; random processes; CRF suffix identifying model; DFA; FSM stemming method; Uyghur language processing; Uyghur noun inflectional suffix; agglutinative language; conditional random field model; deterministic finite automaton; finite state machine; morphotactic rule; reverse order; Algorithm design and analysis; Automata; Buildings; Dictionaries; Doped fiber amplifiers; Information science; Morphology; Natural language processing; Natural languages; Statistical analysis; Ambiguous FSM; CRF; Uyghur; stemming;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Technology, 2009. ICCSIT 2009. 2nd IEEE International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-4519-6
Electronic_ISBN
978-1-4244-4520-2
Type
conf
DOI
10.1109/ICCSIT.2009.5234727
Filename
5234727
Link To Document