DocumentCode
1122641
Title
Information retrieval in document image databases
Author
Lu, Yue ; Tan, Chew Lim
Author_Institution
Dept. of Comput. Sci. & Technol., East China Normal Univ., Shanghai, China
Volume
16
Issue
11
fYear
2004
Firstpage
1398
Lastpage
1410
Abstract
With the rising popularity and importance of document images as an information source, information retrieval in document image databases has become a growing and challenging problem. In this paper, we propose an approach with the capability of matching partial word images to address two issues in document image retrieval: word spotting and similarity measurement between documents. First, each word image is represented by a primitive string. Then, an inexact string matching technique is utilized to measure the similarity between the two primitive strings generated from two word images. Based on the similarity, we can estimate how a word image is relevant to the other and, thereby, decide whether one is a portion of the other. To deal with various character fonts, we use a primitive string which is tolerant to serif and font differences to represent a word image. Using this technique of inexact string matching, our method is able to successfully handle the problem of heavily touching characters. Experimental results on a variety of document image databases confirm the feasibility, validity, and efficiency of our proposed approach in document image retrieval.
Keywords
character sets; document handling; image retrieval; string matching; visual databases; character fonts; document image database; document image retrieval; document similarity measurement; information retrieval; partial word image matching; string matching; word searching; word spotting; Digital images; Document image processing; Electronics packaging; Image converters; Image databases; Image retrieval; Information retrieval; Internet; Optical character recognition software; Paper technology; 65; Index Terms- Document image retrieval; document similarity measurement.; partial word image matching; primitive string; word searching;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2004.76
Filename
1339266
Link To Document