Title :
Detecting and locating partially specified keywords in scanned images using hidden Markov models
Author :
Chen, Francine R. ; Wilcox, Lynn D. ; Bloomberg, Dan S.
Author_Institution :
Xerox Palo Alto Res. Center, CA, USA
Abstract :
A hidden Markov model (HMM) based system for detecting locating, or spotting, user-specified keywords in scanned images is described. The system is font-independent, and no pre-segmentation of text and graphics is required. The bounding boxes of potential lines of text are extracted from the image using morphology. Feature vectors based on the external shape and internal structure of characters are computed for each bounding box. A keyword HMM is created by concatenating appropriate context-dependent character HMMs. The non-keyword HMM is based on context-dependent sub-character models. Keywords are spotted using Viterbi decoding on an HMM network created from the keyword and non-keyword HMMs. This model allows detection of keywords embedded in a line without pre-segmentation of the line into words or characters. Thus keywords may be specified by a baseform and variants of the keyword can be detected
Keywords :
Viterbi decoding; feature extraction; hidden Markov models; word processing; Viterbi decoding; bounding box; bounding boxes; context-dependent character HMMs; context-dependent sub-character models; external shape; feature vectors; font-independent; hidden Markov model; internal structure; keyword HMM; morphology; partially specified keywords; scanned images; user-specified keywords; Character recognition; Facsimile; Graphics; Hidden Markov models; Image recognition; Image retrieval; Image segmentation; Information retrieval; Morphology; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
DOI :
10.1109/ICDAR.1993.395765