DocumentCode :
2510890
Title :
Text Separation from Mixed Documents Using a Tree-Structured Classifier
Author :
Peng, Xujun ; Setlur, Srirangaraj ; Govindaraju, Venu ; Sitaram, Ramachandrula
Author_Institution :
Dept of Comput. Sci. & Eng., SUNY at Buffalo, Amherst, NY, USA
fYear :
2010
fDate :
23-26 Aug. 2010
Firstpage :
241
Lastpage :
244
Abstract :
In this paper, we propose a tree-structured multi-class classifier to identify annotations and overlapping text from machine printed documents. Each node of the tree-structured classifier is a binary weak learner. Unlike normal decision tree(DT) which only considers a subset of training data at each node and is susceptible to over-fitting, we boost the tree using all training data at each node with different weights. The evaluation of the proposed method is presented on a set of machine printed documents which have been annotated by multiple writers in an office/collaborative environment.
Keywords :
decision trees; pattern classification; text analysis; binary weak learner; machine printed documents; mixed documents; normal decision tree; text separation; tree-structured classifier; tree-structured multiclass classifier; Artificial neural networks; Conferences; Decision trees; Hidden Markov models; Testing; Training; Training data; classification; decision tree; documents;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
ISSN :
1051-4651
Print_ISBN :
978-1-4244-7542-1
Type :
conf
DOI :
10.1109/ICPR.2010.68
Filename :
5597583
Link To Document :
بازگشت