Title :
A Markov language model in Chinese text recognition
Author :
Lee, Hsi-Jian ; Tung, Cheng-Huang ; Chien, Che-Hui Chang
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
Abstract :
A two-stage Chinese text recognition system is presented. In the first stage, a Chinese character is first segmented nonuniformly into 10 strips horizontally and vertically. Then three statistical features, viz. crossing counts peripheral background area and contour line length are extracted to form a 60-dimension feature vector. A feature matching method based on the city-block distance metric is employed to select N nearest neighbors as the candidates for each input character from the reference template base, which consists of 5,401 frequently-used Chinese characters. In the second stage, a 3-part-of-speech (tri-POS) Markov language model is employed to extract the most promising characters from all candidate characters in an input sentence. The dynamic programming method is applied to find the most promising sentence hypothesis whose part-of-speech sequence has the maximum likelihood to occur among all of the candidate sentences for an input sentence. The tri-POS contextual information is estimated from a tagged corpus
Keywords :
Markov processes; character recognition; dynamic programming; feature extraction; heuristic programming; model-based reasoning; natural languages; Chinese text recognition; Markov language model; candidate sentences; city-block distance metric; contour line length; crossing counts; dynamic programming; feature matching method; feature vector; input sentence; nearest neighbors; nonuniform character segmentation; part-of-speech sequence; peripheral background area; reference template base; sentence hypothesis; statistical features; strips; tagged corpus; tri-POS contextual information; Character recognition; Image segmentation; Maximum likelihood estimation; Natural languages; Nearest neighbor searches; Probability; Spatial databases; Strips; Text recognition; Vocabulary;
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
DOI :
10.1109/ICDAR.1993.395779