DocumentCode :
2145890
Title :
Localization of Digit Strings in Farsi/Arabic Document Images Using Structural Features and Syntactical Analysis
Author :
Abedi, Ali ; Faez, Karim
Author_Institution :
Electr. Eng. Dept., Amirkabir Univ. of Technol., Tehran, Iran
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
728
Lastpage :
733
Abstract :
This paper presents a new method for localization of digit strings with a specific syntax in Farsi/ Arabic document images. First, some features are extracted from all connected components in each text line. These features, are provided for Farsi/ Arabic scripts, and have the ability to differentiate between digits and non-digit connected components. Then, these features are classified, and the probabilities of being in each of four classes digit, slash, double-digit, and non-digit, is assigned to each connected component. Next, discrete hidden Marcov model as syntactic analyzer, localize digit strings with desired syntaxes. The results which are presented for handwritten and machine-printed text lines, separately, are very promising.
Keywords :
document image processing; handwriting recognition; hidden Markov models; natural language processing; Farsi-Arabic document images; digit strings localization; discrete hidden Markov model; handwritten text lines; machine printed text lines; structural features; syntactical analysis; Feature extraction; Hidden Markov models; Neodymium; Pattern recognition; Support vector machines; Syntactics; Training; Farsi/Arabic document image analysis; digit strings localization; feature extraction; handwritten dates; syntax verification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.152
Filename :
6065407
Link To Document :
بازگشت