Title :
Document Image Indexing Using Edit Distance Based Hashing
Author :
Hassan, Ehtesham ; Chaudhury, Santanu ; Gopal, M.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol., New Delhi, India
Abstract :
We present a novel word image based document indexing scheme by combination of string matching and hashing. The word image representation is defined by string codes obtained by unsupervised learning over graphical primitives. The indexing framework is defined by distance based hashing function which does the object projection to hash space by preserving their distances. We have used edit distance based string matching for defining the hashing function and for approximate nearest neighbor based retrieval. The application of the proposed indexing framework is presented for two document image collections belonging to Devanagari and Bengali script.
Keywords :
cryptography; file organisation; image representation; image retrieval; indexing; string matching; unsupervised learning; word processing; Bengali script; Devanagari script; approximate nearest neighbor based retrieval; document image collections; edit distance based hashing; graphical primitives; hash space; hashing function; image representation; object projection; string codes; string matching; unsupervised learning; word image based document indexing scheme; Equations; Image representation; Image segmentation; Indexing; Shape; Text analysis; Distance based hashing; Document image indexing; Edit distance; Shape descriptor;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2011.242