Title :
Analysis on Effect Range of Context in Chinese Word Segmentation Based Word-Position Tagging
Author :
Xijie Wang ; An Guo
Author_Institution :
Sch. of Comput. & Inf. Eng., Anyang Normal Univ., Anyang, China
Abstract :
Chinese word segmentation (CWS) can be transformed into word-position-based approaches by conditional random field (CRF). It improved the performance of Chinese word segmentation greatly which makes it in wide use recently. When training on corpus with CRF, the size of feature window is the key to the training effect. To analyze the effect range of context, string sequence tagging segmentations are performed on Bakeoff2005 with toolkit CRF++0.53 and the results are: (1) contribution of below is greater than above, (2) size of feature window influencing the segment performance is no larger than 5, the proper size is four or five.
Keywords :
identification technology; natural language processing; random processes; word processing; Bakeoff2005; CRF++0.53 toolkit; CWS; Chinese word segmentation; conditional random field; effect range analysis; feature window; string sequence tagging segmentation; training effect; word-position tagging; Computers; Context; Educational institutions; Indexes; Performance analysis; Tagging; Training; Chinese Word Segmentation; Conditional Random Field; Context; feature window;
Conference_Titel :
Multimedia Information Networking and Security (MINES), 2012 Fourth International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4673-3093-0
DOI :
10.1109/MINES.2012.76