Title :
Locality Sensitive Pseudo-Code for Document Images
Author :
Terasawa, Kengo ; Tanaka, Yuzuru
Author_Institution :
Hokkaido Univ., Sapporo
Abstract :
In this paper, we propose a novel scheme for representing character string images in the scanned document. We converted conventional multi-dimensional descriptors into pseudo-codes which have a property that: if two vectors are near in the original space then encoded pseudo-codes are ´semi equivalent with high probability. For this conversion, we combined locality sensitive hashing (LSH) indices and at the same time we also developed a new family of LSH functions that is superior to earlier ones when all vectors are constrained to lie on the surface of the unit sphere. Word spotting based on our pseudo-code becomes faster than multi-dimensional descriptor-based method while it scarcely degrades the accuracy.
Keywords :
codes; document image processing; image representation; image scanners; optical character recognition; character string images; document images; image representation; locality sensitive hashing indices; locality sensitive pseudocode; multidimensional descriptors; word spotting; Computational efficiency; Data mining; Degradation; Document image processing; Encoding; Image analysis; Image converters; Image sequence analysis; Laboratories; Vector quantization;
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
Print_ISBN :
978-0-7695-2822-9
DOI :
10.1109/ICDAR.2007.4378678