• DocumentCode
    2529231
  • Title

    Converting Myanmar printed document image into machine understandable text format

  • Author

    Win, Htwe Pa Pa ; Khine, Phyo Thu Thu ; Tun, Khin Nwe Ni

  • Author_Institution
    Univ. of Comput. Studies, Yangon, Myanmar
  • fYear
    2011
  • fDate
    26-28 Sept. 2011
  • Firstpage
    96
  • Lastpage
    101
  • Abstract
    The large amount of Myanmar document images are getting archived by the Digital Libraries, an efficient strategy is needed to convert document image into machine understandable text format. The state of the art OCR systems can´t do for Myanmar scripts as our language pose many challenges for document understanding. Therefore, this paper plans an OCR system for Myanmar Printed Document (OCRMPD) with several proposed methods that can automatically convert Myanmar printed text to machine understandable text. Firstly, the input image is enhanced by making some correction on noise variants. Then, the characters are segmented with a novel segmentation method. The features of the isolated characters are extracted with a hybrid feature extraction method to overcome the similarity problems of the Myanmar scripts. Finally, hierarchical mechanism is used for SVM classifier for recognition of the character image. The experiments are carried out on a variety of Myanmar printed documents and results show the efficiency of the proposed algorithms.
  • Keywords
    digital libraries; document image processing; optical character recognition; pattern classification; support vector machines; Myanmar printed document image convertion; OCR systems; OCRMPD; SVM classifier; character image recognition; digital libraries; machine understandable text format; Accuracy; Character recognition; Feature extraction; Image segmentation; Optical character recognition software; Support vector machines; Text recognition; Myanmar scripts; OCRMPD; feature extraction; segmentation; support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Management (ICDIM), 2011 Sixth International Conference on
  • Conference_Location
    Melbourn, QLD
  • ISSN
    Pending
  • Print_ISBN
    978-1-4577-1538-9
  • Type

    conf

  • DOI
    10.1109/ICDIM.2011.6093371
  • Filename
    6093371