• DocumentCode
    1486843
  • Title

    Incorporating language syntax in visual text recognition with a statistical model

  • Author

    Hull, Jonathan J.

  • Author_Institution
    Ricoh California Res. Center, Menlo Park, CA, USA
  • Volume
    18
  • Issue
    12
  • fYear
    1996
  • fDate
    12/1/1996 12:00:00 AM
  • Firstpage
    1251
  • Lastpage
    1255
  • Abstract
    The use of a statistical language model to improve the performance of an algorithm for recognizing digital images of handwritten or machine-printed text is discussed. A word recognition algorithm first determines a set of words (called a neighborhood) from a lexicon that are visually similar to each input word image. Syntactic classifications for the words and the transition probabilities between those classifications are input to the Viterbi algorithm. The Viterbi algorithm determines the sequence of syntactic classes (the states of an underlying Markov process) for each sentence that have the maximum a posteriori probability, given the observed neighborhoods. The performance of the word recognition algorithm is improved by removing words from neighborhoods with classes that are not included on the estimated state sequence. An experimental application is demonstrated with a neighborhood generation algorithm that produces a number of guesses about the identity of each word in a running text. The use of zero, first and second order transition probabilities and different levels of noise in estimating the neighborhood are explored
  • Keywords
    Markov processes; Viterbi detection; grammars; image classification; image recognition; natural languages; optical character recognition; statistical analysis; Markov process; Viterbi algorithm; digital image recognition; handwritten text; language syntax; lexicon; machine-printed text; maximum a posteriori probability; neighborhood; statistical language model; syntactic class sequence; visual text recognition; word recognition algorithm; Character recognition; Computational modeling; Degradation; Hidden Markov models; Image analysis; Image recognition; Speech recognition; Text analysis; Text recognition; Viterbi algorithm;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/34.546261
  • Filename
    546261