• DocumentCode
    2030620
  • Title

    Word spotting in scanned images using hidden Markov models

  • Author

    Chen, Francine R. ; Wilcox, Lynn U. ; Bloomberg, Dun S.

  • Author_Institution
    Xerox Palo Alto Res. Center, CA, USA
  • Volume
    5
  • fYear
    1993
  • fDate
    27-30 April 1993
  • Firstpage
    1
  • Abstract
    A hidden-Markov-model (HMM)-based system for font-independent spotting of user-specified keywords in a scanned image is described. Word bounding boxes of potential keywords are extracted from the image using a morphology-based preprocessor. Feature vectors based on the external shape and internal structure of the word are computed over vertical columns of pixels in a word bounding box. For each user-specified keyword, an HMM is created by concatenating appropriate context-dependent character HMMs. Nonkeywords are modeled using an HMM based on context-dependent subcharacter models. Keyword spotting is performed using a Viterbi search through the HMM network created by connecting the keyword and nonkeyword HMMs in parallel. Applications of word-image spotting include information filtering in images from facsimile and copy machines, and information retrieval from text image databases.<>
  • Keywords
    hidden Markov models; image segmentation; mathematical morphology; optical character recognition; search problems; HMM network; Viterbi search; context-dependent subcharacter models; font-independent spotting; hidden Markov models; morphology-based preprocessor; scanned images; user-specified keywords; word bounding box; word-image spotting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on
  • Conference_Location
    Minneapolis, MN, USA
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7402-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.1993.319732
  • Filename
    319732