DocumentCode
1169826
Title
Hidden tree Markov models for document image classification
Author
Diligenti, Michelangelo ; Frasconi, Paolo ; Gori, Marco
Author_Institution
Dipt. di Ingegneria dell´´Informazione, Siena Univ., Italy
Volume
25
Issue
4
fYear
2003
fDate
4/1/2003 12:00:00 AM
Firstpage
519
Lastpage
523
Abstract
Classification is an important problem in image document processing and is often a preliminary step toward recognition, understanding, and information extraction. In this paper, the problem is formulated in the framework of concept learning and each category corresponds to the set of image documents with similar physical structure. We propose a solution based on two algorithmic ideas. First, we obtain a structured representation of images based on labeled XY-trees (this representation informs the learner about important relationships between image subconstituents). Second, we propose a probabilistic architecture that extends hidden Markov models for learning probability distributions defined on spaces of labeled trees. Finally, a successful application of this method to the categorization of commercial invoices is presented.
Keywords
document image processing; hidden Markov models; image classification; image representation; learning (artificial intelligence); probability; trees (mathematics); commercial invoice categorization; concept learning; document image classification; hidden Markov models; hidden tree Markov models; image recognition; image representation; information extraction; labeled XY-trees; machine learning; probabilistic architecture; probability distributions; Data mining; Explosives; Feature extraction; Hidden Markov models; Image classification; Image recognition; Machine learning; Multi-layer neural network; Organizing; Probability distribution;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/TPAMI.2003.1190578
Filename
1190578
Link To Document