DocumentCode
643374
Title
Comparison of different classifiers for script identification from handwritten document
Author
Obaidullah, Sk Md ; Roy, Kaushik ; Das, Niladri
fYear
2013
fDate
26-28 Sept. 2013
Firstpage
1
Lastpage
6
Abstract
For a multi script/lingual country like India Script identification is a complex real life problem for automation of document processing. Handwritten script identification is again much more complex compared to print one. Here scripts from multi script handwritten documents are identified and then performance is compared using different well known classifiers. We followed a two stage approach for the same. Firstly, we have identified six scripts used for writing six official languages of India in Handwritten domain, which are easily available to us. Using some Abstract/Mathematical features, Structure based features and Script dependent features at document level a 41 dimensional feature set is prepared. Then, a series of classifiers namely Logistic Model Tree, Random Forest, Multi Layer Perceptron, Sequential Minimal Optimization, LibLINEAR, RBFNetwork and Fuzzy Unordered Rule Induction Algorithm are applied on the feature set to classify among the six handwritten scripts and the results are compared. Among all these classifiers, Logistic Model Tree shows highest accuracy rate of 91.2% with a 5 fold cross validation whereas SMO model has lowest convergence time of 0.05s.
Keywords
document image processing; fuzzy set theory; image classification; multilayer perceptrons; radial basis function networks; text analysis; India; LibLINEAR; RBFNetwork; SMO model; abstract-mathematical features; classifier comparison; dimensional feature set; document processing automation; fuzzy unordered rule induction algorithm; handwritten script identification; logistic model tree; multilayer perceptron; multilingual country; multiscript country; multiscript handwritten documents; random forest; script dependent features; script identification; sequential minimal optimization; structure based features; Accuracy; Computers; Feature extraction; Fractals; Logistics; Neurons; Optical character recognition software; Classifier; Handwritten Script Identification; Optical Character Recognizer; Weka;
fLanguage
English
Publisher
ieee
Conference_Titel
Signal Processing, Computing and Control (ISPCC), 2013 IEEE International Conference on
Conference_Location
Solan
Print_ISBN
978-1-4673-6188-0
Type
conf
DOI
10.1109/ISPCC.2013.6663388
Filename
6663388
Link To Document