Title :
Preprocessing and structural feature extraction for a multi-fonts Arabic/Persian OCR
Author :
Kavianifar, Mandana ; Amin, Adnan
Author_Institution :
Sch. of Comput. Sci. & Eng., New South Wales Univ., Kensington, NSW, Australia
Abstract :
English and Chinese are languages which have attracted tremendous interest from character recognition researchers. In contrast, research in the field of character recognition for Arabic/Persian scripts faces major problems, mainly related to their unique characteristics, like being cursive, the multiple shapes of one character in different positions in a word, and the connectivity of characters on the baseline. The work proposed in this paper consists of three major phases. After digitizing the text, the original image is transformed into a gray-scale image using a 300-dpi scanner. Different pre-processing steps are then applied to the image file. In the next phase, sub-words of all words are recognized and global features for each word are extracted. Contour tracing plays a very important role in the feature extraction phase
Keywords :
character sets; feature extraction; optical character recognition; 300-dpi image scanner; Arabic scripts; Persian scripts; baseline; character connectivity; character positions; contour tracing; cursive scripts; gray-scale image; image file; image transformation; multi-font OCR; multiple character shapes; preprocessing; structural feature extraction; sub-word recognition; text digitization; word global feature extraction; Africa; Argon; Australia; Character recognition; Computer science; Feature extraction; Natural languages; Optical character recognition software; Pattern recognition; Shape;
Conference_Titel :
Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
0-7695-0318-7
DOI :
10.1109/ICDAR.1999.791762