Title :
Shape Code Based Word-Image Matching for Retrieval of Indian Multi-lingual Documents
Author :
Tarafdar, Arundhati ; Mondal, Ranju ; Pal, Srikanta ; Pal, Umapada ; Kimura, Fumitaka
Author_Institution :
Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Kolkata, India
Abstract :
In the current scenario retrieving information from document images is a challenging problem. In this paper we propose a shape code based word-image matching (word-spotting) technique for retrieval of multilingual documents written in Indian languages. Here, each query word image to be searched is represented by a primitive shape code using (i) zonal information of extreme points (ii) vertical shape based feature (iii) crossing count (with respect to vertical bar position) (iv) loop shape and position (v) background information etc. Each candidate word (a word having similar aspect ratio and topological feature to the query word) of the document is also coded accordingly. Then, an inexact string matching technique is used to measure the similarity between the primitive codes generated from the query word image and each candidate word of the document with which the query image is to be searched. Based on the similarity score, we retrieve the document where the query image is found. Experimental results on Bangla, Devnagari and Gurumukhi scripts document image databases confirm the feasibility and efficiency of our proposed approach.
Keywords :
document image processing; image matching; image retrieval; string matching; Indian multilingual documents; background information; crossing count; document image retrieval; inexact string matching technique; loop position; loop shape; query word image; shape code based word-image matching; vertical shape based feature; word-spotting technique; zonal information; Computer vision; Encoding; Feature extraction; Image coding; Image segmentation; Pattern recognition; Shape; Document image processing; Indian script document image; document image retrieval; shape code; word spotting;
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4244-7542-1
DOI :
10.1109/ICPR.2010.490