DocumentCode
2142288
Title
Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method
Author
Rusiñol, Marçal ; Aldavert, David ; Toledo, Ricardo ; Lladós, Josep
Author_Institution
Dept. Cienc. de la Computacio, Univ. Autonoma de Barcelona, Bellaterra, Spain
fYear
2011
fDate
18-21 Sept. 2011
Firstpage
63
Lastpage
67
Abstract
In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in non-Latin scripts.
Keywords
document image processing; feature extraction; handwriting recognition; indexing; word processing; SIFT descriptors; bag of visual word model; feature vectors; handwritten historical document images; heterogeneous document image collections; latent semantic indexing technique; nonLatin scripts; patch based framework; segmentation free word spotting method; typewritten historical document images; Feature extraction; Hidden Markov models; Image segmentation; Indexing; Large scale integration; Semantics; Visualization; Dense SIFT Features; Heterogeneous Document Collections; Latent Semantic Indexing; Word Spotting;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location
Beijing
ISSN
1520-5363
Print_ISBN
978-1-4577-1350-7
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2011.22
Filename
6065277
Link To Document