Title :
Support Vector Machine (SVM) based classifier for Khmer Printed Character-set Recognition
Author :
Pongsametrey Sok ; Nguonly Taing
Author_Institution :
R. Univ. of Phnom Penh, Phnom Penh, Cambodia
Abstract :
This paper describes on the use of Support Vector Machine (SVM) based classification method on Khmer Printed Character-set Recognition (PCR) in bitmap document. Khmer language has been identified as one of the most complex language with the total of 74 alphabets and the wording compound can has up to 5 vertical levels. This paper proposes one new method, SVM for Khmer character classification system by using 3 different SVM kernels (Gaussian, Polynomial and Linear Kernel) on data training and recognition to find out the best kernel for Khmer language. The method allows us to use small training dataset by training different pieces of character training instead of training big amount of clusters. The classification uses binary data of 0 as white space and 1 as black pixel area of the character; each training piece of character has been stretched into a matrix of the binary data in all kinds of image size. Feature extraction is extracted from the matrix to use in SVM classification. After recognition, there are some rules to combine each cluster or character by using character levels or common mistake correction. The experiment of about pure 750 Khmer words or around 3000 characters show that SVM method with Gaussian Kernel produces a good result with better performance among all kernels. The system uses one font "Khmer OS Content" of the training data with font size 32pt to recognize 3 different font sizes. The accuracy of 28pt font size is 98.17%, 32pt is 98.62% and 36pt is 98.54% respectively.
Keywords :
character recognition; feature extraction; support vector machines; Gaussian kernel; Khmer OS content; Khmer character classification system; Khmer printed character-set recognition; Linear kernels; PCR; SVM based classifier; SVM classification; SVM kernels; SVM method; character training; feature extraction; polynomial kernels; support vector machine; Character recognition; Decision support systems; Feature extraction; Kernel; Optical character recognition software; Support vector machines; Training; Khmer OCR; Khmer Unicode; Optical Character Recognition; SVM;
Conference_Titel :
Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
Conference_Location :
Siem Reap
DOI :
10.1109/APSIPA.2014.7041823