• DocumentCode
    3315203
  • Title

    Conditional Random Fields combined FSM stemming method for Uyghur

  • Author

    Wumaier, Aishan ; Yibulayin, Tuergen ; Kadeer, Zaokere ; Tian, Shengwei

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Xinjiang Univ., Urumqi, China
  • fYear
    2009
  • fDate
    8-11 Aug. 2009
  • Firstpage
    295
  • Lastpage
    299
  • Abstract
    This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM.
  • Keywords
    deterministic automata; finite state machines; natural language processing; random processes; CRF suffix identifying model; DFA; FSM stemming method; Uyghur language processing; Uyghur noun inflectional suffix; agglutinative language; conditional random field model; deterministic finite automaton; finite state machine; morphotactic rule; reverse order; Algorithm design and analysis; Automata; Buildings; Dictionaries; Doped fiber amplifiers; Information science; Morphology; Natural language processing; Natural languages; Statistical analysis; Ambiguous FSM; CRF; Uyghur; stemming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Technology, 2009. ICCSIT 2009. 2nd IEEE International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-4519-6
  • Electronic_ISBN
    978-1-4244-4520-2
  • Type

    conf

  • DOI
    10.1109/ICCSIT.2009.5234727
  • Filename
    5234727