Title :
Chinese personal name recognition using N-gram model and rules
Author :
Chen Lin ; Zhang Hui ; Li Zhen´an
Author_Institution :
State Key Lab. of Software Dev. Environ., Beihang Univ., Beijing, China
Abstract :
Chinese personal name recognition plays an important role in Chinese word segmentation and it´s difficult to recognize whether a sequence of characters is a name or not for its complexity. This paper presents a new algorithm based on N-gram model and recognition rules to resolve this problem. In order to increase efficiency and accuracy, we also build several dictionaries such as a surname dictionary and a person-name dictionary. Experiments on different corpora show that the improved tokenizer using this algorithm performs stably and achieves more than 10 percent word segmentation accuracy increase than the original one. Averagely the improved tokenizer´s recall rate and accuracy rate are both over 92%.
Keywords :
natural language processing; pattern recognition; text analysis; word processing; Chinese personal name recognition; Chinese word segmentation; N-gram model; characters sequence; person-name dictionary; recognition rules; surname dictionary; tokenizer; Chinese personal name recognition; N-gram model; recognition rules;
Conference_Titel :
Computing and Convergence Technology (ICCCT), 2012 7th International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4673-0894-6