• DocumentCode
    2446167
  • Title

    Preparing Persian/Arabic Scanned Images for OCR

  • Author

    Shirali-Shahreza, Sajad ; Manzuri-Shalmani, M.T. ; Shirali-Shahreza, M. Hassan

  • Author_Institution
    Dept. of Comput. Eng., Sharif Univ. of Tech., Tehran
  • Volume
    1
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    1332
  • Lastpage
    1336
  • Abstract
    Digital documents are widely used today. So converting written documents such as books to digital documents is unavoidable. The most popular method for doing this is OCR. Usually documents are scanned and then scanned images are sent to OCR. Scanned images need some preprocessing in order to be used in OCR efficiently. In this paper, we introduce a method for preparing scanned Persian/Arabic printed texts for OCR. Our method considered especial features of Persian/Arabic scripts such as dots and connecting characters. Main phases of our work are converting grayscale image to binary image, removing straight lines and frames and identifying picture components
  • Keywords
    document image processing; image segmentation; optical character recognition; pattern recognition; Persian/Arabic scanned images; binary image; digital documents; grayscale image; image processing; optical character recognition; page segmentation; pattern recognition; picture components identification; written document conversion; Books; Character recognition; Electrostatic precipitators; Gray-scale; Image converters; Image processing; Image segmentation; Joining processes; Natural languages; Optical character recognition software; Arabic/Persian Document; Image Processing; OCR; Page Segmentation; Pattern Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Communication Technologies, 2006. ICTTA '06. 2nd
  • Conference_Location
    Damascus
  • Print_ISBN
    0-7803-9521-2
  • Type

    conf

  • DOI
    10.1109/ICTTA.2006.1684574
  • Filename
    1684574