Preparing Persian/Arabic Scanned Images for OCR

Author

Shirali-Shahreza, Sajad ; Manzuri-Shalmani, M.T. ; Shirali-Shahreza, M. Hassan

Author_Institution

Dept. of Comput. Eng., Sharif Univ. of Tech., Tehran

Volume

1

fYear

0

fDate

0-0 0

Firstpage

1332

Lastpage

1336

Abstract

Digital documents are widely used today. So converting written documents such as books to digital documents is unavoidable. The most popular method for doing this is OCR. Usually documents are scanned and then scanned images are sent to OCR. Scanned images need some preprocessing in order to be used in OCR efficiently. In this paper, we introduce a method for preparing scanned Persian/Arabic printed texts for OCR. Our method considered especial features of Persian/Arabic scripts such as dots and connecting characters. Main phases of our work are converting grayscale image to binary image, removing straight lines and frames and identifying picture components

Keywords

document image processing; image segmentation; optical character recognition; pattern recognition; Persian/Arabic scanned images; binary image; digital documents; grayscale image; image processing; optical character recognition; page segmentation; pattern recognition; picture components identification; written document conversion; Books; Character recognition; Electrostatic precipitators; Gray-scale; Image converters; Image processing; Image segmentation; Joining processes; Natural languages; Optical character recognition software; Arabic/Persian Document; Image Processing; OCR; Page Segmentation; Pattern Recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Information and Communication Technologies, 2006. ICTTA '06. 2nd

Conference_Location

Damascus

Print_ISBN

0-7803-9521-2

Type

conf

DOI

10.1109/ICTTA.2006.1684574

Filename

1684574