Title :
New word detection algorithm for Chinese based on extraction of local context information
Author :
Zeng, Hua-Lin ; Zhou, Chang-Le ; Shi, Xiao-Dong ; Li, Tang-Qiu ; Su, Chang
Author_Institution :
Dept. of Cognitive Sci., Xiamen Univ., Xiamen, China
Abstract :
Chinese segmentation is an important issue in Chinese text processing. The traditional segmentation methods those depend on an existing dictionary suffer the drawbacks when encounter unknown words. The paper proposed a segmenting algorithm for Chinese based on extracting local context information. It added the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focusing on the process of online segmentation and new word detection achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent.
Keywords :
information retrieval; natural language processing; statistical analysis; text analysis; word processing; Chinese segmentation; Chinese text processing; PPM statistical model; local context information extraction; word detection algorithm; Context modeling; Data mining; Decoding; Detection algorithms; Hidden Markov models; Intelligent systems; Knowledge engineering; Natural languages; Predictive models; Testing;
Conference_Titel :
Intelligent System and Knowledge Engineering, 2008. ISKE 2008. 3rd International Conference on
Conference_Location :
Xiamen
Print_ISBN :
978-1-4244-2196-1
Electronic_ISBN :
978-1-4244-2197-8
DOI :
10.1109/ISKE.2008.4731038