• DocumentCode
    3102777
  • Title

    Improving Chinese named entity recognition with lexical information

  • Author

    Fu, Guo-hong

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Heilongjiang Univ., Harbin, China
  • Volume
    6
  • fYear
    2009
  • fDate
    12-15 July 2009
  • Firstpage
    3487
  • Lastpage
    3491
  • Abstract
    Named entity recognition (NER) plays a critical role in many natural language processing applications. Chinese NER is usually formalized as a chunking task. However, most formulations do not distinguish named entities from common words. This makes it difficult to explore lexical cues for NER. In this paper we propose a two-level IOB2 representation to merge lexical chunks and entity chunks, and develop a morpheme-based chunking system for Chinese NER. It works in three main steps: Given a plain Chinese sentence, a morpheme segmenter first segments it into a sequence of morphemes, then a lexical chunker is applied to tag each segmented morpheme with a proper lexical chunk tag indicating its position pattern in forming a word of a special type, and finally an entity chunker continues to label each morpheme with a hybrid chunk tag, containing the related entity boundary and category information if any. Our experiments on the IEER-99 and MET2 data demonstrate a significant enhancement of NER performance after using entity-internal part-of-speech information. We also show that lexical chunking quality is of importance for NER results.
  • Keywords
    natural language processing; text analysis; Chinese named entity recognition; entity chunks; lexical chunks; lexical information; morpheme-based chunking system; natural language processing; Application software; Computer science; Cybernetics; Data mining; Machine learning; Natural language processing; Testing; Text mining; Text recognition; White spaces; Entity chunking; Information extraction; Named entity recognition; lexical chunking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2009 International Conference on
  • Conference_Location
    Baoding
  • Print_ISBN
    978-1-4244-3702-3
  • Electronic_ISBN
    978-1-4244-3703-0
  • Type

    conf

  • DOI
    10.1109/ICMLC.2009.5212793
  • Filename
    5212793