• DocumentCode
    1864526
  • Title

    OCRMPD: OCR system for Myanmar printed document image with a novel segmentation method and hierarchical classification scheme

  • Author

    Win, Htwe Pa Pa ; Khine, Phyo Thu Thu ; Tun, Khin Nwe Ni

  • Author_Institution
    Univ. of Comput. Studies, Yangon, Myanmar
  • fYear
    2011
  • fDate
    25-27 Aug. 2011
  • Firstpage
    285
  • Lastpage
    291
  • Abstract
    As large quantity of document images is getting archived by the digital libraries, an efficient strategy that can convert Myanmar document image into machine understandable text format is needed. And Myanmar language contains many words, and most of them are similar, especially for small fonts, the accuracy of the Optical Character Recognition, OCR system for Myanmar may be low. Therefore, this paper designs an OCR system for Myanmar Printed Document (OCRMPD) with several proposed methods that can automatically convert Myanmar printed text to machine understandable text. In order to get more accurate system, enhance the input image by removing noise and making some correction on variants. A method for isolation of the character image is proposed by using connected component analysis for wrongly segmented characters produced by projection only. Finally, hierarchical mechanism is used for SVM classifier for recognition of the character image. The proposed algorithms have been tested on a variety of Myanmar printed documents and the results of the experiments indicate that the methods can increase the segmentation accuracy as well as recognition rates.
  • Keywords
    digital libraries; document image processing; image classification; image denoising; image segmentation; language translation; natural language processing; optical character recognition; support vector machines; text analysis; Myanmar language; Myanmar printed text conversion; OCR system for Myanmar printed document image; OCRMPD image; SVM classifier; character image recognition; digital library; hierarchical classification scheme; image segmentation method; machine understandable text format; noise removal; optical character recognition; Accuracy; Character recognition; Feature extraction; Image segmentation; Optical character recognition software; Support vector machines; Text recognition; Myanmar scripts; OCR; OCRMPD; Support vector machine; machine printed;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computer Communication and Processing (ICCP), 2011 IEEE International Conference on
  • Conference_Location
    Cluj-Napoca
  • Print_ISBN
    978-1-4577-1479-5
  • Electronic_ISBN
    978-1-4577-1481-8
  • Type

    conf

  • DOI
    10.1109/ICCP.2011.6047882
  • Filename
    6047882