Title :
Improving Feature Extraction in Named Entity Recognition Based on Maximum Entropy Model
Author :
Jiang, Wei ; Guan, Yi ; Wang, Xiao-long
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol.
Abstract :
A new method of improving feature extraction for named entity recognition is proposed in this paper. First of all, the context features and the entity features are extracted by the corresponding algorithm. The triggers extracted by mutual information, information gain, average mutual information etc, are adopted to enhance the context features. And rough set theory is used to extract the entity features. Secondly, word cluster method is presented to improve the approach of expanding features, which make us select features more easily, and overcome the sparse data problem effectively. Finally, all the features are added into the maximum entropy model. The experiments have confirmed that our method is effective. The above method has been used in our word segmenter, which participated in the International SIGHAN-2005 Evaluation, and ranked first in open test in MSR corpus
Keywords :
feature extraction; maximum entropy methods; rough set theory; text analysis; word processing; International SIGHAN-2005 Evaluation; MSR corpus; average mutual information; feature extraction; maximum entropy model; named entity recognition; rough set theory; thesaurus; word cluster method; Clustering algorithms; Computer science; Cybernetics; Data mining; Decision trees; Electronic mail; Entropy; Feature extraction; Hidden Markov models; Machine learning; Mutual information; Set theory; Feature extraction; Maximum entropy model; Named entity recognition; Word cluster;
Conference_Titel :
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location :
Dalian, China
Print_ISBN :
1-4244-0061-9
DOI :
10.1109/ICMLC.2006.258916