Title :
Automatic recognition of Chinese place names: a statistical and rule-based combined approach
Author :
Zheng, Jia-heng ; Tan, Hong-ye ; Liu, Kai-Ying ; Zhao, Ying
Author_Institution :
Dept. of Comput. Sci., Shanxi Univ., Taiyuan, China
Abstract :
The automatic recognition of Chinese place names, a special case of the recognition of Chinese special nouns, is an important task in Chinese information processing. In this paper, we propose an approach combining statistical and rule-based techniques. The proposed approach discovers candidates from Chinese texts based upon the probability of a character being part of a Chinese place name; and confirms or eliminates the candidates by applying rules obtained by human summarization and transformation-based machine learning. In this approach, we employ a statistical measure: weight of likelihood (WOL), to estimate the likelihood of a character being part of a Chinese place name in real corpora. To the authors´ knowledge, it is the first time WOL has been used to capture the capability of a character forming Chinese places names in real corpora. We evaluate the performance of our approach on a real data set and the recall and precision are 97% and 90.92% respectively
Keywords :
character recognition; learning (artificial intelligence); text analysis; Chinese information processing; Chinese special nouns; automatic Chinese place name recognition; corpora; human summarization; rule-based techniques; statistical techniques; transformation-based machine learning; weight of likelihood; Books; Computer science; Dictionaries; Humans; Information processing; Information technology; Machine learning; Probability; Text recognition; Weight measurement;
Conference_Titel :
Systems, Man, and Cybernetics, 2001 IEEE International Conference on
Conference_Location :
Tucson, AZ
Print_ISBN :
0-7803-7087-2
DOI :
10.1109/ICSMC.2001.972883