• DocumentCode
    176798
  • Title

    A Khmer named entity recognition method by fusing language characteristics

  • Author

    Huashan Pan ; Xin Yan ; Zhengtao Yu ; Jianyi Guo

  • Author_Institution
    Sch. of Inf. Eng. & Autom., Kunming Univ. of Sci. & Technol., Kunming, China
  • fYear
    2014
  • fDate
    May 31 2014-June 2 2014
  • Firstpage
    4003
  • Lastpage
    4007
  • Abstract
    Aiming at the problem of Khmer named entity recognition, we proposed a method fusing Khmer entity characteristics based on the universal feature templates. For the relatively stable entity that is formed of time expressions and digital expressions, we recognize it using artificial rules; For the complex entity that is formed of names, locations, and organizations, we use Conditional Random Fields algorithm, taking word, part of speech, contextual information and Khmer entity characteristics into consideration, to build a complex entity recognition model to recognize it. Experimental results show that the named entity recognition method fusing Khmer entity characteristics has a better effect.
  • Keywords
    natural language processing; random processes; text analysis; Khmer entity characteristics; Khmer named entity recognition method; artificial rules; complex entity recognition model; conditional random fields algorithm; contextual information; digital expressions; language characteristics; locations; organizations; part of speech; time expressions; universal feature templates; Character recognition; Computational linguistics; Educational institutions; Electronic mail; Hidden Markov models; Natural languages; Speech recognition; Conditional Random Fields; Khmer; entity characteristics; named entity recognition; rules;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control and Decision Conference (2014 CCDC), The 26th Chinese
  • Conference_Location
    Changsha
  • Print_ISBN
    978-1-4799-3707-3
  • Type

    conf

  • DOI
    10.1109/CCDC.2014.6852881
  • Filename
    6852881