• DocumentCode
    3142548
  • Title

    Domain adaptive Chinese Word Segmentation based on domain knowledge and word-formation feature

  • Author

    Qin, Xiao ; Wu, Yuqian

  • Author_Institution
    Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
  • fYear
    2011
  • fDate
    27-29 Nov. 2011
  • Firstpage
    344
  • Lastpage
    350
  • Abstract
    This paper describes a novel method about domain adaptive Chinese Word Segmentation. Unlike traditional methods, our system takes advantage of the domain knowledge and word-formation feature. First, we construct a general knowledge by bootstrapping, which contains domain independent information. Then a cross-domain model is generated with general knowledge and cross-domain knowledge. Furthermore, we realize the importance of word-formation in word segmentation. The segmentation results will be revised with word-formation strategy. There is scarce study about word-formation and this method indeed plays a significant role. We test our system on the corpora given by CIPS-SIGHAN 2010, and our system achieves F score of above 0.94 in all four domains. The good performance proves the effectiveness of our approach.
  • Keywords
    learning (artificial intelligence); natural language processing; statistical analysis; word processing; bootstrapping; cross-domain knowledge; cross-domain model; domain adaptive Chinese word segmentation; domain knowledge; general knowledge; word-formation feature; AV feature; chinese word segmentation; domain adaption; domain knowledge; word formation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing andKnowledge Engineering (NLP-KE), 2011 7th International Conference on
  • Conference_Location
    Tokushima
  • Print_ISBN
    978-1-61284-729-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2011.6138223
  • Filename
    6138223