Title :
Farsi/Arabic Handwritten from Machine-Printed Words Discrimination
Author :
Mozaffari, Saeed ; Bahar, P.
Author_Institution :
Electr. & Comput. Eng. Dept., Semnan Univ., Semnan, Iran
Abstract :
Separating handwritten texts from machine-printed materials is a desirable task towards a general document analysis system. In this paper, we proposed a simple and effective method to discriminate handwritten from machine-printed words in Farsi/Arabic documents. After finding word blocks, three different feature sets were extracted. They include two well-established features, previously used for Latin handwritten from machine-printed text separation, and a new feature, called baseline profile. Then, extracted features were combined together to obtain a feature vector with 34 elements. SVM and KNN classifiers were utilized to separate handwritten and machine-printed words. To evaluate the proposed method, some special forms, designed for word separation, were used. Experimental results show that our system differentiates between handwritten and machine-printed words with the overall accuracy of 97.1%.
Keywords :
document image processing; feature extraction; handwriting recognition; learning (artificial intelligence); natural language processing; pattern classification; support vector machines; text analysis; Arabic documents; Arabic handwritten words; Farsi documents; Farsi handwritten words; KNN classifiers; Latin handwritten; SVM; baseline profile; document analysis system; feature sets extraction; feature vector; machine-printed materials; machine-printed text separation; machine-printed words discrimination; word separation; Handwriting recognition; Farsi/Arabic Document Analysis; handwritten from machine-printed discrimination;
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on
Conference_Location :
Bari
Print_ISBN :
978-1-4673-2262-1
DOI :
10.1109/ICFHR.2012.202