Title :
Word Extraction Based on Mutual Information for Ancient Chinese Database
Author :
Li, Xin-fu ; Zhao, Jie ; Sun, Hao-jun
Author_Institution :
Fac. of Math. & Comput. Sci., Hebei Univ., Baoding
Abstract :
Word extraction is an important issue in archaic Chinese corpus processing. In this paper, we focus to extract multi-character word for an archaic Chinese database. The candidate words are extracted based on associative strength of characters. It is up to the user to judge whether a candidate word is real word or not. After a word is extracted, the mutual information that is related to it is modified accordingly. Word is extracted recursively. The empirical results show that mutual information method is an effective auxiliary approach to extract multi-character word for an archaic Chinese database
Keywords :
database management systems; natural languages; statistical analysis; word processing; archaic Chinese database; mutual information; statistical analysis; word extraction; Books; Computer science; Cybernetics; Data mining; Feature extraction; Frequency; Machine learning; Mathematics; Mutual information; Natural languages; Spatial databases; Sun; Archaic Chinese Database; mutual information; statistical feature; word extraction;
Conference_Titel :
Machine Learning and Cybernetics, 2006 International Conference on
Conference_Location :
Dalian, China
Print_ISBN :
1-4244-0061-9
DOI :
10.1109/ICMLC.2006.258918