DocumentCode :
3367364
Title :
A method of Chinese named entity recognition based on maximum entropy model
Author :
Hui, Ning ; Hua, Yang ; Ya-zhou, Tan ; Hao, Wu
Author_Institution :
Comput. Sci. & Technol. Coll., Harbin Eng. Univ., Harbin, China
fYear :
2009
fDate :
9-12 Aug. 2009
Firstpage :
2472
Lastpage :
2477
Abstract :
There are many connotative semantic features in Chinese which can help Chinese named entity recognition. Moreover, one of the important strongpoint of maximum entropy model is that it can syncretize features in different granularity and level. With that in mind, many Chinese named entity semantic knowledge bases were established by extracting information from corpus in this paper. However, because of the limitation of corpus´s size and data sparse which occurs universally in statistic-based method, much significant information can´t be extracted. In order to resolve this problem, in this thesis the idea of semantic expansion is applied in named entity recognition field. It is validated by experiment that relative to using unexpanded knowledge base average recall is increased by 1.17%, and F value is increased by 0.41%. Especially, the precision, recall and F value of complicated organization name recognition is increased by 0.24%, 1.39% and 0.86% respectively.
Keywords :
character recognition; knowledge based systems; maximum entropy methods; natural language processing; statistical analysis; Chinese named entity recognition; data sparse; information extraction; maximum entropy model; semantic knowledge; statistic-based method; Automation; Computer science; Data mining; Educational institutions; Entropy; Hidden Markov models; Mechatronics; Natural language processing; Probability distribution; Space technology; Chinese Named Entity; Maximum Entropy Model; Semantic Expansion;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Mechatronics and Automation, 2009. ICMA 2009. International Conference on
Conference_Location :
Changchun
Print_ISBN :
978-1-4244-2692-8
Electronic_ISBN :
978-1-4244-2693-5
Type :
conf
DOI :
10.1109/ICMA.2009.5246408
Filename :
5246408
Link To Document :
بازگشت