• DocumentCode
    1796169
  • Title

    Efficient recognition of machine printed Arabic text using partial segmentation and Hausdorff distance

  • Author

    Saabni, Raid

  • Author_Institution
    Dept. of Comput. Sci., Tel-Aviv Yaffo Acad. Coll., Kafr Qara, Israel
  • fYear
    2014
  • fDate
    11-14 Aug. 2014
  • Firstpage
    284
  • Lastpage
    289
  • Abstract
    There is an urgent need for reliable and efficient systems for off-line automatic reading of machine printed Arabic texts. A partial list of applications that may use such system includes searching and reading in scanned books and manuscript as a part of digital libraries; recognizing text on digitized maps, vehicle license plates, road signs and others. In this research we aim to contribute to the research of recognizing Arabic machine printed texts using a partial segmentation process and Hausdorff distance. The process analyses the layout of the image and segments it to words and Parts of Words (PAWs). The Stroke Width Transform (SWT) is used to calculate the size and the font in order to define a set of multi size sliding windows to search and identify characters within the given shape of a PAW. The process evaluates the similarity of the two sub images (character and sliding window) using Hausdorff distance. The top k - ranked candidates and their places within the PAW are recorded and used to generate a list of full PAWs images. In the next step elements of this list are matched to the given shape in a holistic manner. We have tested our approach using the APTI, the PATS- A01 data sets and a private collection of text images and encouraging results were obtained.
  • Keywords
    image segmentation; natural language processing; optical character recognition; text detection; transforms; Hausdorff distance; PAW; SWT; machine printed Arabic text recognition; off-line automatic reading; optical character recognition; partial segmentation process; parts of words; stroke width transform; Character recognition; Feature extraction; Image segmentation; Optical character recognition software; Shape; Text recognition; Transforms; Arabic OCR; Hausdorff Distance; Partial Segmen-tation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of
  • Conference_Location
    Tunis
  • Type

    conf

  • DOI
    10.1109/SOCPAR.2014.7008020
  • Filename
    7008020