Title :
Keyword Spotting in Document Images through Word Shape Coding
Author :
Bai, Shuyong ; Li, Linlin ; Tan, Chew Lim
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
Abstract :
With large databases of document images available,a method for users to find keywords in documents will be useful. One approach is to perform Optical Character Recognition (OCR) on each document followed by indexing of the resulting text. However, if the quality of the document is poor or time is critical,complete OCR of all images is infeasible. This paper build upon previous works on Word Shape Coding to propose an alternative technique and combination of feature descriptors for keyword spotting without the use of OCR. Different sequence alignment similarity measures can be used for partial or whole word matching. The proposed technique is tolerant to serifs,font styles and certain degrees of touching, broken or overlapping characters. It improves over previous works with not only better precision and lower collision rate, but more importantly, the ability for partial matching. Experiment results show that it is about 15 times faster than OCR. It is a promising technique to boost better document image retrieval.
Keywords :
document image processing; feature extraction; image coding; image matching; optical character recognition; shape recognition; visual databases; document image database; feature descriptor; feature extraction; indexing; keyword spotting; optical character recognition; sequence alignment similarity measure; word matching; word shape coding; Delay; Handwriting recognition; Image coding; Ink; Robustness; Shape; Support vector machine classification; Support vector machines; Text analysis; Voting; Document Image Retrival; Keyword Spotting; Partial Word Matching; Word Searching; Word Shape Coding;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.54