Title :
Learning Rich Hidden Markov Models in Document Analysis: Table Location
Author :
Silva, Ana Costa E
Author_Institution :
Univ. of Edinburgh, Edinburgh, UK
Abstract :
hidden Markov models (HMM) are probabilistic graphical models for interdependent classification. In this paper we experiment with different ways of combining the components of an HMM for document analysis applications, in particular for finding tables in text. We show: a) how to integrate different document structure finders into the HMM; b) that transition probabilities should vary along the chain to embed general knowledge axioms of our field, c) some emission energies can be selectively ignored, and d) emission and transition probabilities can be weighed differently. We conclude these changes increase the expressiveness and usability of HMMs in our field.
Keywords :
hidden Markov models; learning (artificial intelligence); pattern classification; probability; text analysis; HMM; document analysis application; document structure finder; emission energy; hidden Markov model; interdependent classification; knowledge axiom; learning algorithm; probabilistic graphical model; text table location; transition probability; Classification tree analysis; Costs; Decision trees; Entropy; Graphical models; Hidden Markov models; Support vector machine classification; Support vector machines; Text analysis; Usability; Hidden markov Models (HMM); graphical models; table location;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.185