DocumentCode
311134
Title
A language model based on semantically clustered words in a Chinese character recognition system
Author
Lee, Hsi-Jian ; Tung, Cheng-Huang
Author_Institution
Dept. of Comput. Sci. & Inf. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
Volume
1
fYear
1995
fDate
14-16 Aug 1995
Firstpage
450
Abstract
This paper presents a new method for clustering the words in a dictionary into word groups, which are applied in a Chinese character recognition system with a language model to describe the contextual information. The Chinese synonym dictionary Tong2yi4ci2 ci2lin2 providing the semantic features is used to train the weights of the semantic attributes of the character-based word classes. The weights of the semantic attributes are next updated according to the words of the behavior dictionary, which has a rather complete word set. Then, the updated word classes are clustered into m groups according to the semantic measurement by a greedy method. The words in the behavior dictionary can finally be assigned into the m groups. The parameter space for bigram contextual information of the character recognition system is m2. From the experimental results, the recognition system with the proposed model has shown better performance than that of a character-based bigram language model
Keywords
character recognition; computational linguistics; Chinese character recognition; Chinese synonym dictionary; Tong2yi4ci2 ci2lin2; behavior dictionary; character recognition system; language model; semantic attributes; semantically clustered words; Character recognition; Computer science; Context modeling; Dictionaries; Error correction; Natural languages; Postal services; Random access memory;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
Conference_Location
Montreal, Que.
Print_ISBN
0-8186-7128-9
Type
conf
DOI
10.1109/ICDAR.1995.599033
Filename
599033
Link To Document