DocumentCode :
3060388
Title :
Segmentation of Degraded Malayalam Words: Methods and Evaluation
Author :
Sachan, Devendra ; Dutta, Shrey ; Naveen, T.S. ; Jawahar, C.V.
Author_Institution :
Center for Visual Inf. Technol., IIIT Hyderabad, Hyderabad, India
fYear :
2011
fDate :
15-17 Dec. 2011
Firstpage :
70
Lastpage :
73
Abstract :
In most of the Optical Character Recognition softwares, a substantial percentage of errors are caused by the incorrect segmentation of degraded words. This is especially true for recognizing old books, newspapers and historical manuscripts. In this paper, we propose multiple segmentation methods which address the problem of cuts and merges in degraded words. We have created an annotated dataset of 1034 word images with pixel level ground truth for quantitative evaluation of the methods. We compare the methods with a baseline implementation based on connected component analysis. We report substantial improvement in accuracy both at character and at word level.
Keywords :
image segmentation; natural language processing; optical character recognition; statistical analysis; baseline implementation; connected component analysis; degraded Malayalam words; historical manuscripts; multiple segmentation methods; newspapers; optical character recognition softwares; pixel level ground truth; Accuracy; Algorithm design and analysis; Databases; Degradation; Image segmentation; Optical character recognition software; Transforms; Character Segmentation; Degradation Correction; Indian Language; Malayalam;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2011 Third National Conference on
Conference_Location :
Hubli, Karnataka
Print_ISBN :
978-1-4577-2102-1
Type :
conf
DOI :
10.1109/NCVPRIPG.2011.23
Filename :
6133003
Link To Document :
بازگشت