• DocumentCode
    2563802
  • Title

    Autonomously normalized horizontal differentials as features for HMM-based Omni font-written OCR systems for cursively scripted languages

  • Author

    Attia, Mohamed ; Rashwan, Mohsen A A ; El-Mahallawy, Mohamed S M

  • Author_Institution
    Eng. Co. for the Dev. of Comput. Syst., RDI, Egypt
  • fYear
    2009
  • fDate
    18-19 Nov. 2009
  • Firstpage
    185
  • Lastpage
    190
  • Abstract
    Automatic font-written Optical Character Recognition (OCR) is highly desirable for numerous modern information technology (IT) applications. Reliable font-written OCR´s for Latin scripts are readily in use since long. For cursively scripted languages, that are the mother tongues of over one fourth of the world population, such OCR´s are however not available at a robust and reliable performance. In this regard, the main challenge is the mandatory connectivity of characters/ligatures (i.e. graphemes) that has to be resolved simultaneously upon the recognition of these graphemes. Among the various approaches tried over decades, Hidden Markov Models (HMM)-based OCR´s seem to be the most promising as they capitalize on the ability of HMM decoders to achieve segmentation and recognition simultaneously similar to the widely used HMM-based automatic speech recognition (ASR). Unlike ASR´s, what is missing in HMM-based OCR´s is the definition of a rigorously founded features vector capable to robustly achieving minimal “font type/size-independent” (omnifont) word error rates comparable to those realized with Latin scripts. Here comes the contribution of this paper that introduces such a sound features vector design, and experimentally shows its superiority in this regard.
  • Keywords
    hidden Markov models; image segmentation; optical character recognition; HMM-based omni font; cursively scripted languages; font-written OCR systems; grapheme recognition; hidden Markov models; image recognition; image segmentation; normalized horizontal differentials; optical character recognition; Application software; Automatic speech recognition; Character recognition; Error analysis; Hidden Markov models; Image processing; Optical character recognition software; Robustness; Signal processing; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Signal and Image Processing Applications (ICSIPA), 2009 IEEE International Conference on
  • Conference_Location
    Kuala Lumpur
  • Print_ISBN
    978-1-4244-5560-7
  • Type

    conf

  • DOI
    10.1109/ICSIPA.2009.5478619
  • Filename
    5478619