Word spotting in scanned images using hidden Markov models

Author

Chen, Francine R. ; Wilcox, Lynn U. ; Bloomberg, Dun S.

Author_Institution

Xerox Palo Alto Res. Center, CA, USA

Volume

5

fYear

1993

fDate

27-30 April 1993

Firstpage

1

Abstract

A hidden-Markov-model (HMM)-based system for font-independent spotting of user-specified keywords in a scanned image is described. Word bounding boxes of potential keywords are extracted from the image using a morphology-based preprocessor. Feature vectors based on the external shape and internal structure of the word are computed over vertical columns of pixels in a word bounding box. For each user-specified keyword, an HMM is created by concatenating appropriate context-dependent character HMMs. Nonkeywords are modeled using an HMM based on context-dependent subcharacter models. Keyword spotting is performed using a Viterbi search through the HMM network created by connecting the keyword and nonkeyword HMMs in parallel. Applications of word-image spotting include information filtering in images from facsimile and copy machines, and information retrieval from text image databases.<>

Keywords

hidden Markov models; image segmentation; mathematical morphology; optical character recognition; search problems; HMM network; Viterbi search; context-dependent subcharacter models; font-independent spotting; hidden Markov models; morphology-based preprocessor; scanned images; user-specified keywords; word bounding box; word-image spotting;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on

Conference_Location

Minneapolis, MN, USA

ISSN

1520-6149

Print_ISBN

0-7803-7402-9

Type

conf

DOI

10.1109/ICASSP.1993.319732

Filename

319732