DocumentCode
1909396
Title
A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora
Author
Jiayi Zhao ; Xipeng Qiu ; Xuanjing Huang
Author_Institution
Sch. of Comput. Sci., Fudan Univ., Shanghai, China
fYear
2013
fDate
17-19 Aug. 2013
Firstpage
227
Lastpage
230
Abstract
Chinese word segmentation and part-of-speech tagging (S&T) are fundamental steps for more advanced Chinese language processing tasks. Recently, it has attracted more and more research interests to exploit heterogeneous annotation corpora for Chinese S&T. In this paper, we propose a unified model for Chinese S&T with heterogeneous annotation corpora. We first automatically construct a loose and uncertain mapping between two representative the heterogeneous corpora, Penn Chinese Tree bank (CTB) and PKU´s People´s Daily (PPD). Then we regard the Chinese S&T with heterogeneous corpora as two ``related´´ tasks and train our unified model on two heterogeneous corpora simultaneously. Experiments show that our unified model can boost the performances of both of the heterogeneous corpora by using the shared information, and achieves significant improvements over the state-of-the-art methods.
Keywords
computational linguistics; natural language processing; CTB; Chinese S&T; Chinese language processing tasks; Chinese word segmentation; PKU people daily; POS tagging; PPD; Penn Chinese tree bank; heterogeneous annotation corpora; loose mapping; part-of-speech tagging; uncertain mapping; unified model; Bismuth; Frequency locked loops; Frequency modulation; Integrated circuits; Chinese word segmentation; POS Tagging; heterogeneous annotation;
fLanguage
English
Publisher
ieee
Conference_Titel
Asian Language Processing (IALP), 2013 International Conference on
Conference_Location
Urumqi
Type
conf
DOI
10.1109/IALP.2013.64
Filename
6646042
Link To Document