DocumentCode
3367364
Title
A method of Chinese named entity recognition based on maximum entropy model
Author
Hui, Ning ; Hua, Yang ; Ya-zhou, Tan ; Hao, Wu
Author_Institution
Comput. Sci. & Technol. Coll., Harbin Eng. Univ., Harbin, China
fYear
2009
fDate
9-12 Aug. 2009
Firstpage
2472
Lastpage
2477
Abstract
There are many connotative semantic features in Chinese which can help Chinese named entity recognition. Moreover, one of the important strongpoint of maximum entropy model is that it can syncretize features in different granularity and level. With that in mind, many Chinese named entity semantic knowledge bases were established by extracting information from corpus in this paper. However, because of the limitation of corpus´s size and data sparse which occurs universally in statistic-based method, much significant information can´t be extracted. In order to resolve this problem, in this thesis the idea of semantic expansion is applied in named entity recognition field. It is validated by experiment that relative to using unexpanded knowledge base average recall is increased by 1.17%, and F value is increased by 0.41%. Especially, the precision, recall and F value of complicated organization name recognition is increased by 0.24%, 1.39% and 0.86% respectively.
Keywords
character recognition; knowledge based systems; maximum entropy methods; natural language processing; statistical analysis; Chinese named entity recognition; data sparse; information extraction; maximum entropy model; semantic knowledge; statistic-based method; Automation; Computer science; Data mining; Educational institutions; Entropy; Hidden Markov models; Mechatronics; Natural language processing; Probability distribution; Space technology; Chinese Named Entity; Maximum Entropy Model; Semantic Expansion;
fLanguage
English
Publisher
ieee
Conference_Titel
Mechatronics and Automation, 2009. ICMA 2009. International Conference on
Conference_Location
Changchun
Print_ISBN
978-1-4244-2692-8
Electronic_ISBN
978-1-4244-2693-5
Type
conf
DOI
10.1109/ICMA.2009.5246408
Filename
5246408
Link To Document