DocumentCode
3317728
Title
POC-NLW Template Based Tagging Method for Chinese Word Segmentation
Author
Chen, Bo ; He, Hui ; Xu, Weiran ; Guo, Jun
Author_Institution
Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
Volume
2
fYear
2006
fDate
3-6 Nov. 2006
Firstpage
1423
Lastpage
1428
Abstract
In Chinese word segmentation, disambiguation and unknown words identification are becoming the two key issues. In this paper, a two-stage strategy based system is constructed to deal with these problems. First, an n-gram based model is applied to do the basic segmentation as well as disambiguation in some extent. Then, in the second stage, a language tagging template, named POC-NLW, is adopted to carry out a character sequence tagging procedure based on hidden Markov model, which is used to refine the results from the first stage and to identify unknown words. Several detailed experiments have been implemented on the SIGHAN Bakeoff 2005 corpus. Experimental results show that the method can achieve high accuracy on word segmentation, as well as on unknown words identification, with appreciable processing efficiency. This method is characterized by the good interoperability and expansionary over different kinds of unknown words, thus it is applicable for practical Chinese information processing applications
Keywords
hidden Markov models; natural language processing; text analysis; Chinese information processing; Chinese word segmentation; POC-NLW template based tagging; character sequence tagging; hidden Markov model; n-gram based model; unknown word identification; word disambiguation; Dictionaries; Helium; Hidden Markov models; Information processing; Natural languages; Statistics; Tagging;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Security, 2006 International Conference on
Conference_Location
Guangzhou
Print_ISBN
1-4244-0605-6
Electronic_ISBN
1-4244-0605-6
Type
conf
DOI
10.1109/ICCIAS.2006.295295
Filename
4076201
Link To Document