• DocumentCode
    1909396
  • Title

    A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora

  • Author

    Jiayi Zhao ; Xipeng Qiu ; Xuanjing Huang

  • Author_Institution
    Sch. of Comput. Sci., Fudan Univ., Shanghai, China
  • fYear
    2013
  • fDate
    17-19 Aug. 2013
  • Firstpage
    227
  • Lastpage
    230
  • Abstract
    Chinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanced Chinese language processing tasks. Recently, it has attracted more and more research interests to exploit heterogeneous annotation corpora for Chinese S&T. In this paper, we propose a unified model for Chinese S&T with heterogeneous annotation corpora. We first automatically construct a loose and uncertain mapping between two representative the heterogeneous corpora, Penn Chinese Tree bank (CTB) and PKU´s People´s Daily (PPD). Then we regard the Chinese S&T with heterogeneous corpora as two ``related´´ tasks and train our unified model on two heterogeneous corpora simultaneously. Experiments show that our unified model can boost the performances of both of the heterogeneous corpora by using the shared information, and achieves significant improvements over the state-of-the-art methods.
  • Keywords
    computational linguistics; natural language processing; CTB; Chinese S&T; Chinese language processing tasks; Chinese word segmentation; PKU people daily; POS tagging; PPD; Penn Chinese tree bank; heterogeneous annotation corpora; loose mapping; part-of-speech tagging; uncertain mapping; unified model; Bismuth; Frequency locked loops; Frequency modulation; Integrated circuits; Chinese word segmentation; POS Tagging; heterogeneous annotation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2013 International Conference on
  • Conference_Location
    Urumqi
  • Type

    conf

  • DOI
    10.1109/IALP.2013.64
  • Filename
    6646042