DocumentCode
2529231
Title
Converting Myanmar printed document image into machine understandable text format
Author
Win, Htwe Pa Pa ; Khine, Phyo Thu Thu ; Tun, Khin Nwe Ni
Author_Institution
Univ. of Comput. Studies, Yangon, Myanmar
fYear
2011
fDate
26-28 Sept. 2011
Firstpage
96
Lastpage
101
Abstract
The large amount of Myanmar document images are getting archived by the Digital Libraries, an efficient strategy is needed to convert document image into machine understandable text format. The state of the art OCR systems can´t do for Myanmar scripts as our language pose many challenges for document understanding. Therefore, this paper plans an OCR system for Myanmar Printed Document (OCRMPD) with several proposed methods that can automatically convert Myanmar printed text to machine understandable text. Firstly, the input image is enhanced by making some correction on noise variants. Then, the characters are segmented with a novel segmentation method. The features of the isolated characters are extracted with a hybrid feature extraction method to overcome the similarity problems of the Myanmar scripts. Finally, hierarchical mechanism is used for SVM classifier for recognition of the character image. The experiments are carried out on a variety of Myanmar printed documents and results show the efficiency of the proposed algorithms.
Keywords
digital libraries; document image processing; optical character recognition; pattern classification; support vector machines; Myanmar printed document image convertion; OCR systems; OCRMPD; SVM classifier; character image recognition; digital libraries; machine understandable text format; Accuracy; Character recognition; Feature extraction; Image segmentation; Optical character recognition software; Support vector machines; Text recognition; Myanmar scripts; OCRMPD; feature extraction; segmentation; support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Information Management (ICDIM), 2011 Sixth International Conference on
Conference_Location
Melbourn, QLD
ISSN
Pending
Print_ISBN
978-1-4577-1538-9
Type
conf
DOI
10.1109/ICDIM.2011.6093371
Filename
6093371
Link To Document