Title :
Combining Static Classifiers and Class Syntax Models for Logical Entity Recognition in Scanned Historical Documents
Author :
Mao, Song ; Mansukhani, Praveer ; Thoma, George R.
Author_Institution :
U.S. Nat. Libr. of Med., Bethesda
Abstract :
Class syntax can be used to 1) model temporal or locational evolvement of class labels of feature observation sequences, 2) correct classification errors of static classifiers if feature observations from different classes overlap in feature space, and 3) eliminate redundant features whose discriminative information is already represented in the class syntax. In this paper, we describe a novel method that combines static classifiers with class syntax models for supervised feature subset selection and classification in unified algorithms. Posterior class probabilities given feature observations are first estimated from the output of static classifiers, and then integrated into a parsing algorithm to find an optimal class label sequence for the given feature observation sequence. Finally, both static classifiers and class syntax models are used to search for an optimal subset of features. An optimal feature subset, associated static classifiers, and class syntax models are all learned from training data. We apply this method to logical entity recognition in scanned historical U.S. Food and Drug Administration (FDA) documents containing court case Notices of Judgments (NJs) of different layout styles, and show that the use of class syntax models not only corrects most classification errors of static classifiers, but also significantly reduces the dimensionality of feature observations with negligible impact on classification performance.
Keywords :
document image processing; feature extraction; hidden Markov models; image classification; object recognition; support vector machines; class syntax models; feature observation sequences; logical entity recognition; parsing algorithm; scanned historical documents; static classifiers; supervised feature subset selection; Classification algorithms; Data mining; Error correction; Feature extraction; Hidden Markov models; Libraries; Power system modeling; Support vector machine classification; Support vector machines; Training data;
Conference_Titel :
Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on
Conference_Location :
Minneapolis, MN
Print_ISBN :
1-4244-1179-3
Electronic_ISBN :
1063-6919
DOI :
10.1109/CVPR.2007.383253