DocumentCode
3695296
Title
Document indexing framework for retrieval of degraded document images
Author
Ritu Garg;Ehtesham Hassan;Santanu Chaudhury
Author_Institution
Department of Electrical Engineering, Indian Institute of Technology, New Delhi, India
fYear
2015
Firstpage
1261
Lastpage
1265
Abstract
With the availability of large collection of document images in Indian languages, image based retrieval has gained popularity. The performance of such systems is effected by the presence of degraded and noisy images. Moreover, Optical character recognition systems for Indian scripts are not yet robust, leading to noisy OCR´ed text. Information retrieval system designed using inputs from both modalities (image features and OCR based recognition data) will lead to better retrieval performance in contrast to usage of individual modality. In this paper we present a indexing methodology that uses multiple kernel learning to combine features from different modalities by joint optimization of search time and accuracy. The evaluation of the proposed methodology is demonstrated on document images of Bangla and Devanagari script.
Keywords
"Kernel","Image segmentation","Optical character recognition software","Indexing"
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type
conf
DOI
10.1109/ICDAR.2015.7333966
Filename
7333966
Link To Document