Title :
Two-Stage Recognition for Printed Thai and English Characters Using Nearest Neighbor and Support Vector Machine
Author :
Wiwatcharakoses, Chayut ; Patanukhom, Karn
Author_Institution :
Dept. of Comput. Eng., Chiang Mai Univ., Chiang Mai, Thailand
Abstract :
In this paper, we introduce a two-stage recognition process for classification of 164 classes of mixing of printed Thai and English characters. Various structural features based on image ratios, image projections, outer boundaries, Pyramid Histogram of Oriented Gradients (PHOG) are extracted from images. In the first stage, Fuzzy C Mean Clustering (FCM) is applied to create prototypes of every character. The class of nearest neighbor prototype is determined and used as the first stage classification output. A hybrid structure of nearest neighbor classifier and Support Vector Machine (SVM) are proposed for the second stage. Based on classification results obtained from the first stage, the suitable classifiers can be selected. For SVM classifier, possible class candidates for each prototype are analyzed from confusion matrices of the first stage result. For nearest neighbor classifier, in order to refine the result, accurate search on a limited set of training samples corresponding to the nearest prototypes obtained in the first stage is performed. According to experiments on data set of more than 500,000 character images with various font styles, sizes, and resolutions, we obtain the accuracy of 88.09% in the first stage and the result is improved to 97.06% in the second stage. The experiments also show improvement of the proposed scheme in comparison with conventional schemes.
Keywords :
character recognition; feature extraction; image classification; matrix algebra; pattern clustering; support vector machines; FCM; PHOG; SVM classifier; class classification; confusion matrices; fuzzy c mean clustering; hybrid structure; image projections; image ratios; nearest neighbor classifier; nearest neighbor prototype; outer boundaries; printed English characters; printed Thai characters; pyramid histogram of oriented gradients; structural feature extraction; support vector machine; two-stage recognition; Character recognition; Feature extraction; Image edge detection; Prototypes; Support vector machines; Training; Training data; OCR; SVM; Thai characters; hybrid classifier; nearest neighbor; two-stage recognition;
Conference_Titel :
Signal-Image Technology & Internet-Based Systems (SITIS), 2013 International Conference on
Conference_Location :
Kyoto
DOI :
10.1109/SITIS.2013.23