DocumentCode :
1938327
Title :
A New Machine Learning Method for Chinese Overlapping Disambiguity--Conditional Random Fields
Author :
Xiong, Ying ; Zhu, Jie
Author_Institution :
Shanghai Jiao Tong Univ., Shanghai
Volume :
7
fYear :
2007
fDate :
19-22 Aug. 2007
Firstpage :
3922
Lastpage :
3926
Abstract :
Conditional random fields (CRFs) are employed in this paper for resolving Chinese overlapping ambiguity in Chinese word segmentation. Instead of the traditional methods which treated the Chinese overlapping ambiguity as classification problem, the proposed approach regards this task as a sequence labeling problem. The best benefit of this method is that it can deal with overlapping ambiguous strings with any lengths no matter the ambiguous strings are pseudo ambiguity or true ambiguity. Several methods are tested on the same training and test corpora. The experimental results show that the CRF models achieve state-of-the-art performance. In comparison with the maximum entropy classifier and the traditional word bigram model, the accuracy has increased 3.98 % and 9.27 % respectively.
Keywords :
entropy; learning (artificial intelligence); natural language processing; pattern classification; random processes; Chinese overlapping ambiguity; Chinese overlapping disambiguity; Chinese word segmentation; classification problem; conditional random fields; machine learning method; sequence labeling problem; Cybernetics; Educational institutions; Entropy; Hidden Markov models; Humans; Labeling; Learning systems; Machine learning; Support vector machines; Testing; Chinese word segmentation; Conditional random fields; Maximum Entropy classifier; Overlapping ambiguity; Word bigram model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
Type :
conf
DOI :
10.1109/ICMLC.2007.4370831
Filename :
4370831
Link To Document :
بازگشت