DocumentCode :
2020948
Title :
Locality Sensitive Pseudo-Code for Document Images
Author :
Terasawa, Kengo ; Tanaka, Yuzuru
Author_Institution :
Hokkaido Univ., Sapporo
Volume :
1
fYear :
2007
fDate :
23-26 Sept. 2007
Firstpage :
73
Lastpage :
77
Abstract :
In this paper, we propose a novel scheme for representing character string images in the scanned document. We converted conventional multi-dimensional descriptors into pseudo-codes which have a property that: if two vectors are near in the original space then encoded pseudo-codes are ´semi equivalent with high probability. For this conversion, we combined locality sensitive hashing (LSH) indices and at the same time we also developed a new family of LSH functions that is superior to earlier ones when all vectors are constrained to lie on the surface of the unit sphere. Word spotting based on our pseudo-code becomes faster than multi-dimensional descriptor-based method while it scarcely degrades the accuracy.
Keywords :
codes; document image processing; image representation; image scanners; optical character recognition; character string images; document images; image representation; locality sensitive hashing indices; locality sensitive pseudocode; multidimensional descriptors; word spotting; Computational efficiency; Data mining; Degradation; Document image processing; Encoding; Image analysis; Image converters; Image sequence analysis; Laboratories; Vector quantization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
ISSN :
1520-5363
Print_ISBN :
978-0-7695-2822-9
Type :
conf
DOI :
10.1109/ICDAR.2007.4378678
Filename :
4378678
Link To Document :
بازگشت