DocumentCode :
2022354
Title :
Indexing Historical Documents by Word Shape Signatures
Author :
Lladós, Josep ; Sánchez, Gemma
Author_Institution :
Univ. Autonoma de Barcelona, Barcelona
Volume :
1
fYear :
2007
fDate :
23-26 Sept. 2007
Firstpage :
362
Lastpage :
366
Abstract :
In this paper a word spotting approach to index archival image documents is presented. Indices are constructed from keyword images. The spotting strategy is formulated on an indexing-by-shape basis. The well known shape context descriptor is used to compute word image signatures from the skeleton points. Afterwards, codewords are extracted from thresholded shape contexts. It is a simpler and more compact representation based on bit vectors. Document images are roughly segmented into words and a lookup table is constructed. Each word subimage is taken as a bin. Keyword images are spotted into documents by a voting strategy consisting in indexing into the lookup table by codewords, and voting into the corresponding bins. The approach is illustrated by a real application scenario consisting of documents from a digital archive of the Spanish Civil War.
Keywords :
document image processing; history; image coding; image segmentation; indexing; codewords; image documents; indexing historical documents; indexing-by-shape basis; lookup table; roughly segmented; thresholded shape contexts; voting strategy; word shape signatures; word spotting approach; word subimage; Computer vision; Image analysis; Image recognition; Image segmentation; Indexing; Shape; Skeleton; Table lookup; Text analysis; Voting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
ISSN :
1520-5363
Print_ISBN :
978-0-7695-2822-9
Type :
conf
DOI :
10.1109/ICDAR.2007.4378733
Filename :
4378733
Link To Document :
بازگشت