DocumentCode
2030620
Title
Word spotting in scanned images using hidden Markov models
Author
Chen, Francine R. ; Wilcox, Lynn U. ; Bloomberg, Dun S.
Author_Institution
Xerox Palo Alto Res. Center, CA, USA
Volume
5
fYear
1993
fDate
27-30 April 1993
Firstpage
1
Abstract
A hidden-Markov-model (HMM)-based system for font-independent spotting of user-specified keywords in a scanned image is described. Word bounding boxes of potential keywords are extracted from the image using a morphology-based preprocessor. Feature vectors based on the external shape and internal structure of the word are computed over vertical columns of pixels in a word bounding box. For each user-specified keyword, an HMM is created by concatenating appropriate context-dependent character HMMs. Nonkeywords are modeled using an HMM based on context-dependent subcharacter models. Keyword spotting is performed using a Viterbi search through the HMM network created by connecting the keyword and nonkeyword HMMs in parallel. Applications of word-image spotting include information filtering in images from facsimile and copy machines, and information retrieval from text image databases.<>
Keywords
hidden Markov models; image segmentation; mathematical morphology; optical character recognition; search problems; HMM network; Viterbi search; context-dependent subcharacter models; font-independent spotting; hidden Markov models; morphology-based preprocessor; scanned images; user-specified keywords; word bounding box; word-image spotting;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on
Conference_Location
Minneapolis, MN, USA
ISSN
1520-6149
Print_ISBN
0-7803-7402-9
Type
conf
DOI
10.1109/ICASSP.1993.319732
Filename
319732
Link To Document