Title :
Chinese Word Segmentation as POC-NLW Tagging
Author :
Chen, Bo ; He, Hui ; Guo, Jun ; Xu, Weiran
Author_Institution :
Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
Abstract :
In Chinese word segmentation, disambiguation and unknown words identification are the two key issues still remaining. In order to deal with these problems in a uniform way, a language tagging template, named POC-NLW, is presented in this paper to explore the word creation mechanisms of Chinese language on character-level. Based on this template, a hidden Markov model based tagger is constructed to implement word segmentation as character tagging. In this method, the basic word segmentation, disambiguation, and the unknown words identification are integrated fundamentally and accomplished in one unified process. Experimental results on the SIGHAN Bakeoff2005 corpus show that the method can achieve high accuracy on word segmentation, especially on unknown words identification, with appreciable processing efficiency. This method is characterized by the good interoperability and expansionary over different kinds of words, thus it is applicable for practical Chinese information processing applications
Keywords :
hidden Markov models; natural language processing; speech processing; speech recognition; Chinese word segmentation; POC-NLW tagging; SIGHAN Bakeoff2005; character tagging; disambiguation; hidden Markov model; language tagging template; words identification; Costs; Dictionaries; Helium; Hidden Markov models; Information processing; Land mobile radio; Natural language processing; Natural languages; Support vector machines; Tagging;
Conference_Titel :
Signal Processing, 2006 8th International Conference on
Conference_Location :
Beijing
Print_ISBN :
0-7803-9736-3
Electronic_ISBN :
0-7803-9736-3
DOI :
10.1109/ICOSP.2006.345894