• DocumentCode
    3740376
  • Title

    Primitive printed Arabic Optical Character Recognition using statistical features

  • Author

    Mohamed Dahi;Noura A. Semary;Mohiy M. Hadhoud

  • Author_Institution
    Information Technology dept., Faculty of computers and information, Menofia university, Shebin El-Kom, Egypt
  • fYear
    2015
  • Firstpage
    567
  • Lastpage
    571
  • Abstract
    Due to the several forms of different Arabic font types, Arabic character recognition is still a challenge. Most literature works consider only one font per text what results in low recognition accuracy. This paper tends to enhance the accuracy of AOCR (Arabic Optical Character Recognition) by considering an automatic Optical Font Recognition (OFR) stage before going ahead with the traditional OCR stages. This has been achieved using SIFT (Scale Invariant Feature Transform) descriptors. First, a comparative study of four most recent algorithms of primitive OCR has been performed to evaluate the different features and classifiers utilized in their systems. Accordingly, a combining of statistical features have been proposed as well as selecting Random Forest Tree classifier for classification stage. The combination of the features are used to train the classifiers. As a result, each recognized text font is directed to a specific classifier tree. The proposed system was tested on a generated Primitive Arabic Characters Noise Free dataset (PAC-NF) containing 30000 samples. Experimental results achieved a promising character recognition accuracy of 99.8-100%.
  • Keywords
    "Optical character recognition software","Shape"
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computing and Information Systems (ICICIS), 2015 IEEE Seventh International Conference on
  • Print_ISBN
    978-1-5090-1949-6
  • Type

    conf

  • DOI
    10.1109/IntelCIS.2015.7397278
  • Filename
    7397278