DocumentCode
2446167
Title
Preparing Persian/Arabic Scanned Images for OCR
Author
Shirali-Shahreza, Sajad ; Manzuri-Shalmani, M.T. ; Shirali-Shahreza, M. Hassan
Author_Institution
Dept. of Comput. Eng., Sharif Univ. of Tech., Tehran
Volume
1
fYear
0
fDate
0-0 0
Firstpage
1332
Lastpage
1336
Abstract
Digital documents are widely used today. So converting written documents such as books to digital documents is unavoidable. The most popular method for doing this is OCR. Usually documents are scanned and then scanned images are sent to OCR. Scanned images need some preprocessing in order to be used in OCR efficiently. In this paper, we introduce a method for preparing scanned Persian/Arabic printed texts for OCR. Our method considered especial features of Persian/Arabic scripts such as dots and connecting characters. Main phases of our work are converting grayscale image to binary image, removing straight lines and frames and identifying picture components
Keywords
document image processing; image segmentation; optical character recognition; pattern recognition; Persian/Arabic scanned images; binary image; digital documents; grayscale image; image processing; optical character recognition; page segmentation; pattern recognition; picture components identification; written document conversion; Books; Character recognition; Electrostatic precipitators; Gray-scale; Image converters; Image processing; Image segmentation; Joining processes; Natural languages; Optical character recognition software; Arabic/Persian Document; Image Processing; OCR; Page Segmentation; Pattern Recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Information and Communication Technologies, 2006. ICTTA '06. 2nd
Conference_Location
Damascus
Print_ISBN
0-7803-9521-2
Type
conf
DOI
10.1109/ICTTA.2006.1684574
Filename
1684574
Link To Document