Title :
Combining Multi-knowledge for Chinese Word Segmentation Disambiguation
Author :
Qin Ying ; Zhang Suxiang ; Wang Xiaojie
Author_Institution :
Sch. of Inf. Eng., Beijing Univ. of Posts & Telecommun.
Abstract :
In the task of Chinese word segmentation, there are two main segmentation ambiguities, overlapping ambiguity and combination ambiguity. The paper analyzes properties of ambiguities and supposes multi-knowledge approach to disambiguate. Multi-knowledge refers to the knowledge from statistic of large corpus and syntactic, semantic or discourse information about ambiguous words. Class based N-gram and maximum entropy model are applied to combining multi-knowledge and disambiguation
Keywords :
maximum entropy methods; natural language processing; Chinese word segmentation; N-gram; ambiguous words; combination ambiguity; disambiguation; discourse information; large corpus statistic; maximum entropy model; multiknowledge; overlapping ambiguity; semantic information; syntacticinformation; Concrete; Entropy; Grounding; Humans; Information processing; Intelligent systems; Natural languages; Power engineering and energy; Statistics; Vocabulary;
Conference_Titel :
Intelligent Systems Design and Applications, 2006. ISDA '06. Sixth International Conference on
Conference_Location :
Jinan
Print_ISBN :
0-7695-2528-8
DOI :
10.1109/ISDA.2006.124