• DocumentCode
    1169826
  • Title

    Hidden tree Markov models for document image classification

  • Author

    Diligenti, Michelangelo ; Frasconi, Paolo ; Gori, Marco

  • Author_Institution
    Dipt. di Ingegneria dell´´Informazione, Siena Univ., Italy
  • Volume
    25
  • Issue
    4
  • fYear
    2003
  • fDate
    4/1/2003 12:00:00 AM
  • Firstpage
    519
  • Lastpage
    523
  • Abstract
    Classification is an important problem in image document processing and is often a preliminary step toward recognition, understanding, and information extraction. In this paper, the problem is formulated in the framework of concept learning and each category corresponds to the set of image documents with similar physical structure. We propose a solution based on two algorithmic ideas. First, we obtain a structured representation of images based on labeled XY-trees (this representation informs the learner about important relationships between image subconstituents). Second, we propose a probabilistic architecture that extends hidden Markov models for learning probability distributions defined on spaces of labeled trees. Finally, a successful application of this method to the categorization of commercial invoices is presented.
  • Keywords
    document image processing; hidden Markov models; image classification; image representation; learning (artificial intelligence); probability; trees (mathematics); commercial invoice categorization; concept learning; document image classification; hidden Markov models; hidden tree Markov models; image recognition; image representation; information extraction; labeled XY-trees; machine learning; probabilistic architecture; probability distributions; Data mining; Explosives; Feature extraction; Hidden Markov models; Image classification; Image recognition; Machine learning; Multi-layer neural network; Organizing; Probability distribution;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2003.1190578
  • Filename
    1190578