DocumentCode :
2148401
Title :
Document Image Indexing Using Edit Distance Based Hashing
Author :
Hassan, Ehtesham ; Chaudhury, Santanu ; Gopal, M.
Author_Institution :
Dept. of Electr. Eng., Indian Inst. of Technol., New Delhi, India
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
1200
Lastpage :
1204
Abstract :
We present a novel word image based document indexing scheme by combination of string matching and hashing. The word image representation is defined by string codes obtained by unsupervised learning over graphical primitives. The indexing framework is defined by distance based hashing function which does the object projection to hash space by preserving their distances. We have used edit distance based string matching for defining the hashing function and for approximate nearest neighbor based retrieval. The application of the proposed indexing framework is presented for two document image collections belonging to Devanagari and Bengali script.
Keywords :
cryptography; file organisation; image representation; image retrieval; indexing; string matching; unsupervised learning; word processing; Bengali script; Devanagari script; approximate nearest neighbor based retrieval; document image collections; edit distance based hashing; graphical primitives; hash space; hashing function; image representation; object projection; string codes; string matching; unsupervised learning; word image based document indexing scheme; Equations; Image representation; Image segmentation; Indexing; Shape; Text analysis; Distance based hashing; Document image indexing; Edit distance; Shape descriptor;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.242
Filename :
6065500
Link To Document :
بازگشت