• DocumentCode
    2224669
  • Title

    Person name identification in Chinese documents using finite state automata

  • Author

    Shen, Bing ; Zhongfei ; Yuan, Chunfa

  • Author_Institution
    Comput. Sci. Dept., Binghamton Univ., NY, USA
  • fYear
    2003
  • fDate
    13-16 Oct. 2003
  • Firstpage
    478
  • Lastpage
    481
  • Abstract
    This research is about automatic identification and extraction of person names in Chinese text documents. Solutions to this problem have immediate and extensive applications in many areas especially in Web Intelligent Agents related applications such as Web search engines, Web data mining, and automatic Web information analysis. We have noted that while finite state automata (FSA) based techniques have been extensively used in NLP and IE in English, they have not yet been extensively used in processing Chinese text, and in particular, to our knowledge, no work has been reported in using FSA in person name identification and extraction. Motivated by this need, we have proposed a person name identification method based on FSA, called NICF. Evaluations show that NICF works very well in terms of identification recall and accuracy, as well as the processing speed, and thus holds a great promise for future applications.
  • Keywords
    Web sites; automata theory; data mining; finite state machines; search engines; text analysis; Chinese document; Chinese text document; FSA; IE; NICF; NLP; Web information analysis; Web intelligent agents; Web search engine; automatic extraction; automatic idenfication; finite state automata; person name identification; Application software; Automata; Computer science; Data mining; Information analysis; Intelligent agent; Internet; Robustness; Search engines; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Agent Technology, 2003. IAT 2003. IEEE/WIC International Conference on
  • Print_ISBN
    0-7695-1931-8
  • Type

    conf

  • DOI
    10.1109/IAT.2003.1241125
  • Filename
    1241125