Title :
Extraction of numerical strings in Farsi/Arabic documents using structural features
Author :
Abedi, Ali ; Faez, Karim ; Mozaffari, Saeed
Author_Institution :
Electr. Eng. Dept., Amirkabir Univ. of Technol., Tehran, Iran
Abstract :
In this paper, we present an approach to separate digits and non-digits for numerical string extraction in Farsi/Arabic handwritten or machine-printed document images. Each connected component is labeled as it belongs to a numerical string or not. For this purpose we introduce a set of features which firstly based on the maximum difference between digits and non-digits in Farsi. Secondly their complexity and extraction time are much less than those features used for connected components recognition. For feature classification, a fuzzy rule-based classifier is utilized. Experimental results show an acceptable detection rate with low false positive rate.
Keywords :
document image processing; knowledge based systems; Farsi-Arabic documents; fuzzy rule based classifier; handwritten document image; machine-printed document image; numerical string extraction; structural features; Character recognition; Computational intelligence; Computer industry; Costs; Feature extraction; Image converters; Image retrieval; Information retrieval; Optical character recognition software; Text analysis; Farsi/Arabic document analysis; Fuzzy classifier; Numerical string extraction; Structural features;
Conference_Titel :
Computational Intelligence and Industrial Applications, 2009. PACIIA 2009. Asia-Pacific Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4606-3
DOI :
10.1109/PACIIA.2009.5406445