• DocumentCode
    1652383
  • Title

    Maximum Entropy combined FSM stemming method for Uyghur

  • Author

    Wumaier, Aishan ; Kadeer, Zaokere ; Tursun, Parida ; Tian, Shengwei

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Xinjiang Univ., Urumqi, China
  • fYear
    2009
  • Firstpage
    51
  • Lastpage
    55
  • Abstract
    This paper presents the generation of Uyghur noun suffix DFA combined with maximum entropy (MaxEnt) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the MaxEnt model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the MaxEnt suffix identifying model and combination of MaxEnt with FSM.
  • Keywords
    deterministic automata; finite state machines; maximum entropy methods; natural language processing; FSM stemming method; MaxEnt model; Uyghur language processing; Uyghur noun inflectional suffix DFA; agglutinative nature; finite state machine; maximum entropy; morphotactic rule; Algorithm design and analysis; Automata; Buildings; Doped fiber amplifiers; Entropy; Information science; Morphology; Natural languages; Software libraries; Statistical analysis; FSM; Maximum Entropy; Uyghur; stemming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Speech Database and Assessments, 2009 Oriental COCOSDA International Conference on
  • Conference_Location
    Urumqi
  • Print_ISBN
    978-1-4244-4400-7
  • Electronic_ISBN
    978-1-4244-4400-7
  • Type

    conf

  • DOI
    10.1109/ICSDA.2009.5278378
  • Filename
    5278378